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SECTION  I 
INTRODUCTION 

1.1  PASSIVE  ACOUSTIC  AIRBORNE  ASW 

•  The  airborne  ASW  tracking  problem  is  becoming  progressively  more  diffi- 

1 

cult  every  year.  The  continued  efforts  of  our  adversaries  toward  quieter, 
faster  assets  with  longer  ranged  weapons  imply  the  need  for  more  powerful 
localization  and  tracking  systems  able  to  extract  maximum  information  from 
lower  signal  strengths  and  to  obtain  fire  control  solutions  as  quickly  as 
possible.  Current  systems  work  by  extracting  bearing  and  Doppler  information 
from  narrow  band  signals  and  then  using  this  information  in  Extended  Kalman 
filter  tracking  algorithms.  To  operate  successfully  such  systems  require  a 
stable  narrow  band  line  and  relatively  high  signal-to-noise  ratio  (narrow  band 
signal  strength  to  background  noise  strength).  Unfortunately,  these  require¬ 
ments  are  becoming  increasingly  more  difficult  to  satisfy  as  targets  become 
quieter. 

Several  new  sensors  planned  for  the  future  (e.g.,  various  types  of  array 
buoys)  will  help  prolong  the  usefulness  of  current  tracking  systems,  but  it 
is  clear  that  within  very  few  years  new  tracking  systems  will  be  required  to 
exploit  many  different  types  of  measurements  and  to  operate  at  lower  SNR 
levels  than  possible  with  current  systems.  •  _ 

The  key  mathematical  problem  of  passive  ASW  localization  and  tracking 
is  the  estimation  of  the  state  of  time-varying  stochastic  processes  given 
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nonlinear  measurements  with  low  SNR.  A  theoretical  framework  does  exist  for 
such  problems,  at  least  at  the  abstract  level,  in  the  current  theory  of  non¬ 
linear  estimation.  Unfortunately,  this  theory  has  very  little  to  say  at  the 
practical,  computational  level  about  nonlinear  problems  in  general.  Neverthe¬ 
less,  useful  theoretical  results  are  possible  for  nonlinear  filtering  problems 
if  additional  mathematical  structure  is  present,  and  we  believe  that  the  pas¬ 
sive  tracking  problem  of  airborne  ASW  does  have  significant  special  structure 
to  exploit.  The  research  discussed  here  identifies  one  type  of  special  struc¬ 
ture  (time-scale  perturbation)  and  shows  how  to  exploit  it  to  design  improved 
passive  tracking  systems. 

1.2  CONVENTIONAL  VERSUS  INTEGRATED  SYSTEM  ARCHITECTURE 

Current  passive  acoustic  tracking  systems  work  by  extracting  target 
parameters  (such  as  time  delay  or  Doppler  shift  of  a  narrow  band  emission) 
from  a  raw  signal  and  then  processing  those  parameters  to  obtain  estimates  of 
target  parameters  of  interest  (position  and  velocity).  These  two  types  of 
processing  are  referred  to  as  signal  processing  and  tracking.  Conventional 
system  designs  assume  that  signal  processing  and  tracking  can  be  performed 
sequentially  (see  Fig.  1-1  from  [1]).  The  front-end  signal  processor  is  usu¬ 
ally  designed  assuming  that  target  parameters  are  constant  over  time;  the 
back-end  tracker  is  designed  assuming  that  the  parameter  estimates  output  from 
the  signal  processor  are  direct  measurements  of  the  true  parameter  values  with 
the  addition  of  uncorrelated  measurement  noise. 

These  design  assumptions  are  based  on  the  physical  nature  of  observation 
and  target  processes:  that  is,  a  slowly  varying  target  process  modulates  a 
rapidly  varying  observation  process.  Thus,  the  conventional  system  design 
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Figure  1-1.  Generic  Passive  Sonar  Processing  System 
(Knight,  Pridham,  Kay,  1981) 


inverts  the  physical  model  —  i.e.,  a  fast  signal  processor  modulates  a  slow 


tracking  filter.  However,  these  design  assumptions  are  only  approximations 
and  the  segmented  system  is  only  an  approximation  of  the  optimal  processor  for 
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extracting  target  tracks  from  the  raw  acoustic  sensor  data.  In  an  ideal  world 
with  unlimited  computational  and  data  transmission  resources,  one  would  design 
an  optimal,  totally  integrated  tracking  system  that  takes  in  raw  hydrophone 
signals  from  every  sonobuoy  and  outputs  target  position  and  velocity  estimates 
(Fig.  1-2). 

TARGET  POSITIONS 
AND  VELOCITIES 

- ► 

R-1249 

Figure  1-2.  Optimal,  Integrated  Processing  System  Architecture 


HYDROPHONE 

SIGNALS 

- > 
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We  know  what  the  optimal  design  is  —  it  is  the  implementation  of  Bayes 
rule.  One  way  to  realize  this  optimal  algorithm  is  to  discretize  the 
continuous-valued  variables  and  treat  the  problem  as  a  finite  state  Markov 
estimation  problem.  As  the  discretization  becomes  more  refined,  this  approx¬ 
imation  comes  closer  to  the  optimal  Bayesian  algorithm.  Unfortunately,  the 
computational  complexity  also  increases  so  rapidly  that  it  quickly  overwhelms 
any  imaginable  processor  for  all  but  problems  of  small  dimensions  (at  most  2 
or  3).  Numerical  and  processor  work  on  this  problem  is  still  advancing  [2] 
but  it  is  clear  that  this  completely  optimal,  totally  integrated  approach  will 
have  to  be  used  in  conjunction  with  methods  that  decompose  large  dimensional 
problems  into  a  collection  of  small  dimensional  problems  which  can  be  solved 
separately  and  then  recombined  to  obtain  a  tractable  solution  to  the  overall 
high  dimensional  problem. 

Such  decompositions  define  corresponding  system  architectures,  i.e.,  a 
collection  of  components  (which  solve  the  small  dimensional  subproblems) 
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together  with  the  inputs  and  outputs  of  each  component  (and  thus,  connections 
between  components).  For  example,  Fig.  1-3  shows  a  typical  passive  tracking 
system  architecture  with  two  components,  a  front-end  signal  processor  that 
estimates  bearing  and  frequency  lines  from  raw  measurements  and  a  back-end 
tracking  algorithm  that  estimates  position  and  velocity  from  bearing  and  fre¬ 
quency  inputs.  Figure  1-4  shows  another  system  architecture.  It  adds  a  feed¬ 
back  connection  from  the  back-end  tracker  to  the  front-end  signal  processor. 
This  feedback  can  enhance  the  signal  processing  by  providing  estimates  of 
expected  Doppler  shift  and  bearing  to  the  front  end.  However,  if  the  tracker 
back  end  is  producing  poor  position  and  velocity  estimates,  this  feedback  may 
in  fact  worsen  the  performance  of  the  front-end  signal  processor.  How  does  the 
system  of  Fig.  1-4  decide  when  to  switch  feedback  on  or  off?  More  generally, 
how  does  one  decide  which  architecture  is  the  best  one  to  use  in  a  particular 
situation?  Are  frequency  and  bearing  outputs  the  best  ones  to  provide  to  the 
back-end  tracker  in  Fig.  1-4?  If  the  architecture  of  Fig.  1-4  is  used,  what 
is  the  best  feedback  information  that  the  back-end  tracker  can  provide  the 
front-end  signal  processor  in  order  to  tell  the  front  end  when  it  should  and 
when  it  should  not  use  this  information  to  enhance  its  signal  processing?  The 
objective  of  this  research  is  to  show  that  we  can  apply  stochastic  perturba¬ 
tion  theory  to  answer  such  questions  as  these  in  a  precise,  systematic  manner. 


R-1Z48 


Figure  1-3.  Suboptimal,  Segmented  Processing  System  Architecture 
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Figure  1-4.  Segmented  Processing  System  Architecture  with  Feedback 

1.3  FILTERING  PROBLEMS  WITH  TIME  SCALES 

Our  objective  is  to  employ  some  systematic  mathematical  methods  to  iden¬ 
tify  architectures  (that  is,  components,  inputs,  and  outputs)  which  perform 
close  to  optimal,  totally  integrated  systems.  Our  approach  to  doing  this  is 
to  formulate  the  complete  tracking  problem  as  a  mathematical  estimation  prob¬ 
lem,  identify  perturbation  parameters  in  the  mathematical  model,  use  methods 
of  stochastic  perturbation  theory  to  decompose  the  estimation  problem,  and 
interpret  the  decomposition  in  terms  of  a  corresponding  system  architecture. 

The  specific  perturbation  parameter  we  will  exploit  in  this  work  is  the 
ratio  of  the  signal  time  constant  to  the  target  time  constant  which  is  often 
small.  The  conventional  segmented  signal  processing  and  tracking  system  is  an 
approximation  based  on  this  time-scale  separation.  The  goal  of  our  research 
is  to  obtain  time-scale  approximations  of  optimal  filters  systematically  and 
rigorously  from  a  mathematical  analysis  of  the  equations  modeling  the  target 


and  signal. 
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1.4  SIGNIFICANCE  OF  PERTURBATION  METHOD 

We  have  mentioned  only  one  kind  of  perturbation  (namely  time-scale  per¬ 
turbation)  so  far,  but  we  believe  that  Several  other  useful  perturbations  can 
be  identified  in  the  mathematical  models  used  in  passive  acoustic  tracking. 

In  this  subsection  we  will  discuss  the  significance  of  developing  architecture 
decomposition  methods  based  on  stochastic  perturbation  theory. 

1.4.1  Computation  Reduction 

As  we  noted  at  the  beginning  of  this  section,  the  purpose  of  this  method 
is  to  decompose  high  dimensional  problems  into  Lractable  lower  dimensional 
problems.  What  is  significant  is  that  perturbation  theory  provides  a  system¬ 
atic,  general  way  of  doing  this.  Moreover,  this  approach  provides  decomposi¬ 
tions  which  are  small  perturbations  of  the  optimal,  totally  integrated  system. 
How  much  performance  loss  results  depends  on  the  size  of  the  perturbation 
parameter.  Although  quantitative  performance  analysis  is  far  from  easy  even 
in  these  cases,  the  method  does  provide  an  approach  to  identifying  decompo¬ 
sitions  which  promise  minimal  loss  of  performance. 

1.4.2  Hierarchies  of  Architectures 

The  perturbation  method  also  naturally  provides  hierarchical  families  of 
architectures  in  terms  of  asymptotic  expansions  of  optimal  solutions.  That 
is,  one  can  identify  the  simplest  architectures  resulting  from  assuming  that 
the  perturbation  parameters  are  negligible,  but  also  one  can  identify  more 
complex  architectures  which  may  also  have  better  performance  when  the  perbur- 
bation  parameter  is  not  negligible.  For  example,  it  seems  likely  that  the 
architecture  shown  in  Fig.  1-3  is  a  zero-order  approximation  of  the  optimal 
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system  when  the  perturbation  parameter  is  negligible.  We  also  conjecture  that 
some  variation  of  the  architecture  shown  in  Fig.  1-4  is  a  first-order  approx¬ 
imation  of  the  optimal  system.  What  we  do  not  know  is  what  the  exact  feedback 
information  should  be  in  Fig.  1-4,  but  we  expect  stochastic  perturbation 
theory  to  give  us  the  precise  answer. 

1.4.3  Nonintuitive  Architectures 

In  many  problems  experienced,  intuition  suffices  to  determine  zero-order 
and  sometimes  high  order  perturbation  approximations.  Perturbation  theory  is 
much  more  complicated  for  stochastic  problems,  and  we  believe  that  a  system¬ 
atic,  mathematical  method  such  as  we  propose  will  have  a  tremendous  advantage 
over  unaided  intuition.  Reference  [3]  gives  some  examples  of  nonintuitive 
stochastic  perturbation  results. 

1.4.4  Reduction  to  Component  Design 

The  object  of  architecture  decomposition  is  to  reduce  the  problem  of 
designing  a  tracking  system  to  that  of  designing  simpler  components  which  are 
connected  together  as  specified  by  the  architecture.  Architecture  decomposi¬ 
tion  will  specify  these  components  in  terras  of  an  estimation  subproblem  that 
the  component  needs  to  solve.  The  solution  of  the  subproblem  would  generally 
be  determined  by  other  methods,  and  in  many  cases  it  may  happen  that  methods 
already  exist  for  solving  these  subprobleras.  Thus,  the  architecture  decom¬ 
position  shows  how  to  utilize  these  existing  results  in  a  unified  tracking 
system  to  solve  the  problem  at  hand.  For  example,  architecture  decomposition 
would  indicate  that  the  narrowband  problem  above  has  the  architecture  of 


Fig.  1-3.  This  architecture  would  specify  the  frontend  only  as  something 
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which  tries  to  estimate  constant  frequency  and  bearing  from  hydrophone  data. 
The  system  designer  is  then  free  to  note  that  there  already  exist  various 
solutions  to  that  problem  (e.g.,  maximum  likelihood  estimators  or  maximum 
entropy  methods)  and  would  substitute  the  appropriate  module  in  his  system. 

If  the  architecture  indicates  that  this  subproblem  is  particularly  difficult 
(for  example,  the  architecture  analysis  might  indicate  that  the  subproblem 
involves  a  low  signal-to-noise  ratio),  then  the  system  designer  would  know  he 
should  search  for  a  more  sophisticated  front  end. 

1.5  OVERIVEW  OF  REPORT 

Section  2  of  this  report  describes  nonlinear  filtering  models  for  passive 
acoustic  tracking  which  we  will  use  as  a  basis  for  our  study  of  time-scale 
approximations  and  filter  architectures.  Section  3  briefly  reviews  the  funda¬ 
mentals  of  stochastic  estimation  for  dynamical  systems  in  terms  of  stochastic 
differential  equations.  In  Section  4  we  review  previous  work  on  problems  sim¬ 
ilar  to  ours  in  the  areas  of  singular  estimation  and  control,  and  singularly 
perturbed  estimation  and  control.  Section  5  then  contains  a  discussion  of 
existing  techniques  for  the  type  of  perturbation  problem  arising  in  passive 
acoustic  tracking;  this  section  discusses  several  filter  architectures  that 
are  implied  by  the  different  techniques  reviewed  in  Section  4.  One  of  the 
conclusions  of  this  section  is  that  there  are  essentially  no  existing  results 
for  the  class  of  models  that  includes  those  arising  in  passive  ASW  tracking. 
The  remainder  of  the  report  then  deals  with  the  development  of  such  results. 

In  Section  6  we  introduce  two  relatively  simple  nonlinear  filtering  problems 
possessing  the  same  qualitative  features  found  in  passive  ASW  problems.  We 
also  motivate  and  define  a  number  of  approximate  solutions  to  these  problems 
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and  discuss  their  architectural  implications.  Section  7  contains  the  theo¬ 
retical  analysis  of  one  of  these  approximations  that  provides  a  precise  rela¬ 
tionship  between  process  time  scales  and  measurement  quality.  The  extensive 
simulations  in  Section  8  not  only  support  the  result  of  Section  7  but  also 
lead  to  a  number  of  additional  conclusions  including  several  concerning  the 
structure  and  asymptotic  properties  of  front-end/back-end  architectures. 

The  body  of  the  report  then  concludes  with  a  brief  review  and  discussion  in 
Section  9.  Appendix  A,  B,  and  C  contain  technical  derivations  pertinent  to 
Sections  2,  3,  and  6,  respectively. 
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SECTION  2 

PASSIVE  ACOUSTIC  TRACKING  MODELS 


2.1  INTRODUCTION 

This  section  will  describe  a  model  for  passive  acoustic  tracking  prob¬ 
lems,  which  has  been  used  for  passive  tracking  algorithms  in  the  past  and  is 
useful  for  applying  the  nonlinear  filtering  approach  we  are  taking.  The  basic 
simplifying  assumption  is  that  acoustic  signals  travel  in  straight  lines  from 
transmitter  to  receiver  at  a  constant  speed.  From  this  assumption,  we  develop 
simple  models  for  source  and  sensor  motion  effects,  source  aspect  angle  depen¬ 
dence,  sensor  directivity,  and  attentuation.  To  be  sure,  sound  propagation  in 
the  ocean  is  much  more  complex  than  the  model  of  it  presented  here  [4] .  But 
including  a  more  realistic  model  of  sound  propagation,  although  possible,  is 
well  beyond  the  scope  and  needs  of  this  research.  The  model  presented  here 
has  been  sufficient  to  guide  and  test  our  research  into  new  methods  of  com¬ 
bined  signal  processing  and  tracking  which  may  prove  useful  for  practical 
passive  acoustic  tracking. 

2.2  TIME  DELAY,  DOPPLER  EFFECT,  AND  SOURCE-SENSOR  MOTION 
Define  the  following  notation  (see  Fig.  2-1): 

x-p(t)  =  Transmitter  location  at  time  t; 
xR(t)  =  Receiver  location  at  time  t; 
yp(t)  =  Signal  transmitted  at  time  t; 
yg(t)  =  Signal  received  at  time  t. 
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Define  x(t)  implicitly  as 


t(t)  =  Delay  in  signal  received  at  time  t; 
=  ~7T-  |xT(t-T(t>)  -  xR(t)|  . 


(2-1) 


The  basic  relationship  between  yx  and  yR  in  this  case  is 


yR(t)  =  yx(t-T(t)) 


(2-2) 


The  time  derivative  x  of  the  time  delay  x  and  the  Doppler  effect  factor  1-x 
are  of  particular  interest.  These  are  given  by  the  following  equations 
obtained  by  implicit  dif ferentation  of  Eq.  2-1: 


<xT(t-x)  -  xR(t),xT(t-x)  -  xR(t)> 


(2-3) 


c  |xT(t-x)  -  xR(t) 


1  + 


<xx(t-x)  -  xR(t),xT(t-x)> 


c  |xT(t-x)  -  xR(t) 


where  <  ,  >  denotes  the  inner  product  of  vectors,  and 


1-x 


<xT(t-T)  -  xR(t),xR(t-x)> 


1  + 


c  )xx(t-x)  -  xR(t) 


(2-4) 


<xx(t-x)  -  xR(t),xT(t-x)> 

I  1  + - 

L  c  |xX(t-x)  -  XR(t)| 


If  the  sensor  is  motionless,  then  Eqs.  2-3  and  2-4  reduce  to 


<xT(t-x)  -  xR,xT(t-x)> 


T  = 


(2-5) 


C  |xX(t-T)  -  XR| 


<xT(t-x)  -  xR,xT(t-x)> 


1  + 


c  |xT(t-x)  -  xR| 


13 


ALPHATECH,  INC 


l-T  = 


(2-6) 


<xT(t-x)  -  xR,xT(t-r)> 


C  |xX(t-t)  -  XR| 


If  the  target  is  motionless,  then  Eqs.  2-3  and  2-4  reduce  to 


-  <xT  -  xR(t),xR(t» 


c  jxT  -  XR(t)j 


(2-7) 


l-T  -  1  + 


<xx  -  XR(t),XR(t)> 


c  |xT  -  XR( t) I 


(2-8) 


The  time  delay  model  presented  here  in  Eq.  2-2  is  an  approximation  to 


the  exact  solution  of  the  three-dimensional  wave  equation  for  a  moving  point 


source.  The  exact  solution  for  the  case  of  a  stationary  receiver  (xR  =  0)  is 


derived  in  Appendix  A  and  is  given  by 


yR(t)  =*  yx(t-t)  • 


(1-r) 


(2-9) 


where  yx(t)  is  the  sound  pressure  level  at  the  point  source  and  yR(t)  is  the 


sound  pressure  level  at  the  receiver. 


2.3  ATTENUATION 


Real  signals  are  attenuated  as  they  propagate.  Such  effects  can  be  added 


in  the  time  delay  model  by  defining  an  attenuation  function 


A(xx,xR)  =  ratio  of  transmitted  to  received 
signal  amplitude  for  transmitter 
at  xx  and  receiver  at  xR. 


The  corresponding  received  signal  model  is 


yR(t)  =  A( xT( t-t) ,xR( t ) )  yX(t-x)  . 


(2-10) 
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The  attenuation  function  might  satisfy  a  power  law 

A(xt,xr)  =  |xT  -  xr|Y  (2-11) 

where  y  =  -2  for  spherical  spreading  (propagation  in  three  dimensions),  y  =  -1 
for  cylindrical  spreading  (propagation  in  two  dimensions),  or  something  in 
between  (-2  <  y  <  -l).  Urick  [4]  notes  some  values  of  y  to  use  in  the  sonar 
propagation  loss  function  (which  is  essentially  what  is  being  modeled). 

Note  that  the  exact  solution  (Eq.  2-9)  for  the  homogeneous  three- 
dimensional  case  includes  the  geometric  attenuation  due  to  spherical  spreading 
(the  factor  appears  only  as  (ct)-^  =  r”l  attenuation  of  sound  pressure  level). 

2.4  ASPECT  DEPENDENCE 

Figure  2-2  shows  the  geometry  of  aspect  dependence  of  the  transmitted 

signal  y-p.  Note  that  in  this  section  we  assume  that  receiver  and  transmitter 

are  moving  in  a  plane.  The  notation  is 

Xf(t)  **  Transmitter  location  at  time  t; 

xR(t)  =  Receiver  location  at  time  t; 

yT(t)  =  Signal  transmitted  at  time  t; 

yR(t)  =  Signal  received  at  time  t; 

(j>a(t)  =  Acoustic  (delayed)  aspect  angle 
of  target  at  time  t; 

x(t)  =  Delay  in  signal  received  at  time  t; 

g(<ji)  =  Directivity  of  transmitted  signal 
relative  to  target  heading. 

The  transmitted  and  received  signals  are  related  by 

yp(t)  =  g($a(t))  y-r(t-T)  .  (2-12) 

15 


I 
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Note  that  *pa  satisfies 


cos  (lj)a(t)  ) 


-  <xT(t-x)  -  xR(t),xT(t-x)> 
|xT(t-x)  -  xR(t)|  |xT(t-x)| 


(2-13) 


If  the  source  transmits  several  different  signals  yx>R  with  different  direc¬ 
tivity  functions  gR,  then  the  received  signal  is  the  superposition  of  signals 
(Eq.  2-12).  That  is, 


yR(t)  =  I  gk(<t>a(t))  yT,k(t-T)  . 
k=l 


(2-14) 


To  treat  transmitters  and  receivers  moving  in  three  dimensions,  define 


b<j(t)  =  delayed  bearing  directional  vector  at  time  t, 


xT(t-x)  -  xR(t) 
|xT(t-x)  -  xR( t ) | 


vT(t)  =  target  heading  directional  vector  at  time  t, 
xT(t) 

|iT(t)| 


The  source  directivity  g  should  be  a  function  of  b  and  vx  so  that  the  received 


signal  is  given  by 


yR(0  =  g(bd( c ) » vt( t~t) )  yT(t-f)  . 


(2-15) 


If  the  receiver  is  motionless,  then  the  delayed  bearing  is  related  to  the 
instantaneous  bearing  by  the  equation 


bd( t )  =  b(t-x) 


(2-16) 


where  b( t )  is  defined  as 


ALPHATECH,  INC 


b(t)  =  instantaneous  directional  bearing  vector, 
xT(t)  -  xR 
|xT(t)  -  XR| 


2.5  SENSOR  DIRECTIVITY 

Sensor  directivity  can  be  treated  similar  to  aspect  angle  dependence. 

Figure  2-3  shows  the  geometry;  as  in  the  previous  section,  we  assume  that 

receiver  and  transmitter  are  coplanar.  Define  the  notation 

xT(t)  =  Transmitter  location  at  time  t; 

xR(t)  =  Receiver  location  at  time  t; 

y-f(t)  =  Signal  transmitted  at  time  t; 

yR(t)  =  Signal  received  at  time  t; 

Sa(t)  =  Acoustic  (delayed)  bearing  angle 
to  target  at  time  t; 

x(t)  =  Delay  in  signal  received  at  time  t; 

h( <j>)  =  Directivity  of  received  signal 
relative  to  receiver  velocity 
or  fixed  reference  direction. 

The  relation  between  transmitted  and  received  signals  is  similar  to  Eq.  2-12 


yR(t)  -  h(6a(t>)  yT(t-x) 


Note  that  f}a  satisfies 


cos(ga(t))  = 


<xT(t-x)  -  XR(t),XR(t)> 
|xT(t-x)  -  xR(t)|  |xR(t)| 


If  the  receiver  is  not  moving,  replace  xR  by  a  reference  vector  in  Eq.  2-18. 
A  source  transmitting  multiple  signals  is  modeled  by 


yR(t)  =  h(ga(t))  l  yT,k(t_f) 
k=l 
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Figure  2-3.  Sensor  Directivity 
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2 .6  NOISE  MODEL 

In  this  section  we  will  only  present  a  model  for  ambient  background 

noise,  which  we  model  as  an  additive  noise  term  in  the  equation  for  the 

received  signal.  Define 

n(t)  =  Ambient  background  noise  near  the 
receiver  at  time  t; 

yx(t)  =  Signal  transmitted  at  time  t; 

yR(t)  =  Signal  received  at  time  t; 

r(t)  =  Delay  in  signal  received  at  time  t; 
relative  to  receiver  velocity  or 
fixed  reference  direction. 

The  relationship  between  the  received  signal  and  the  transmitted  signal  and 
background  noise  is  given  by 


yR(t)  =  yr(t-T)  +  n(t)  . 


(2-20) 


Note  that  the  noise  process  n(t)  in  Eq.  2-20  is  not  delayed  since  it  does  not 
originate  from  the  target.  Note  also  that  nothing  is  said  about  how  n(t) 
depends  on  the  receiver  location.  One  can  include  a  model  of  the  correlation 


such  as 


E{nj.(t)n2(s) }  =  R(x1(t)-x2(s),t-s  ) 


(2-21) 


for  two  receivers  of  the  same  type  located  at  x^  and  x2. 


2.7  EXAMPLE 

To  illustrate  the  time  delay  model  described  above,  consider  a  single 
source  moving  with  constant  velocity  and  emitting  a  narrowband  signal.  The 
source  trajectory  satisfies  the  differential  equation. 


vVfV'-y 
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(2-22) 


The  source  signal  satisfies  the  stochastic  differential  equation 


d<j>T  =  f  dt  +  o  dw 


(2-23) 


where  f  is  the  constant  source  frequency,  w  is  a  standard  Wiener  process,  and 
the  transmitted  signal  is 


yT(t)  =  sin  <j>x(t) 


(2-24) 


The  signal  y^  is  a  (wide-sense)  stationary  process  with  total  power  1/2  and 
two-sided  power  spectral  density  at  frequency  m  given  by 


-  +  f2  +  w2 

4 


21P 


(2-25) 


+  (f+m)^ 


This  expression  represents  a  spectrum  with  peak  at  frequency  f  (or  f  and  -f 
for  the  two-sided  spectrum)  and  width  proportional  to  o^ . 

Now  suppose  that  there  is  a  motionless  sensor  located  at  xR.  Consider 
a  "difar"  —  i.e.,  a  sonobuoy  with  (limited)  horizontal  directivity.  Here  we 
assume  a  two-dimensional  model  and  the  difar  measures  three  things: 
an  omnidirectional  measurement. 


dyR>1  =  Ai  yj(t-T)  dt  +  dnj  ; 


(2-26) 


and  two  direction  measurements, 


dyR>2  =  a2  cosfB(t-x))  yj(t-T)  dt  +  dn£ 


(2-27) 


g 
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dyR>3  =  A3  sin((3(t-x))  yx(t“T)  dt  +  dn3 


In  these  equations,  t  is  the  time  delay  obtained  from 


T  =  -  |xT(t~r)  -  xR| 

c 


The  bearing  angle  $  is  defined  by  the  equation 


(2-28) 


(2-29) 


cos ( 8( t ) )  xT(t)-xR 


(2-30) 


sin( 8( t) )  |xT(t)-xR| 


We  assume  that  n^ ,  n2 ,  n3  are  standard  Wiener  processes  so  that  the  parameters 
(Aj.)2,  (A2)2,  (A3)2  are  signal-to-noise  ratios  (signal  power  divided  by  noise 
power,  where  power  equals  mean  square  average).  Because  xj  is  a  constant 
velocity  trajectory,  we  can  use 


xj(t)  =  xj(0)  +  t  Xf(0) 


(2-31) 


to  solve  explicitly  for  t  as  a  function  of  t,  Xf(0),  and  xq-(O).  If  we  use 


the  notation 


v  =  xj(0) 


u  =  xT(0)-xR  +  t  xT(0)  , 


(2-32) 


(2-33) 


then  x  is  given  the  formula 


1  fl  ?  /  l  v ! 2  \  | u  | 2 

C2  U2  \  c2  /  c2  J 


(2-34) 


where  <v,u>  is  the  scalar  product  between  vectors  u  and  v. 
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We  will  now  show  how  to  express  this  model  in  terras  of  stochastic  differ¬ 
ential  equations  without  explicit  time  delays  such  as  t-T.  Let  <f>R(t)  denote 
<f>T(t-T).  Then  we  can  rewrite  Eqs.  2-26  ,  2-27  ,  and  2-28  as 


dyR,l  =  sin($R(t))  dt  +  dn! 

dyR,2  “  a2  cos(g(t-T))  sin($R(t))  dt  +  dn2 


dyR,3  =  a3  sin(6(t-r))  sin($R(t))  dt  +  dn3 


(2-35) 


(2-36) 


(2-37) 


According  to  McKean  [5,  p.  41],  $R(t)  satisfies  the  stochastic  differential 


equat ion 


d<)>R  *»  f(l-x)dt  +  a  (i-^)l/2  dw 


(2-38) 


where  w  is  a  standard  Wiener  process.  In  other  words,  the  Doppler  effect  not 
only  shifts  the  frequency  f  but  it  also  expands  the  bandwidth  of  the  signal. 
The  factor  (1-t)!/2  may  not  matter  much  if  t  is  very  small.  However,  for 
broadband  source  signals,  this  factor  is  very  critical  in  allowing  one  to 
estimate  any  Doppler  effects  from  the  received  signal. 

To  summarize,  the  time  delay  model  gives  us  the  following  differential 
equation  model  for  a  single  target  and  a  single  directional  sensor: 


(2-39) 


(2-40) 


<XT(t-T)  -  XR,XT(t-T)> 
c  |xT(t-i)-xR| 
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d<J>R  =  f(l-T)dt  +  o  (1-t)^'^  dw 


(2-41 


dyR(i  =  \\  sin(<j,R(t))  dt  +  dnx 


(2-42 


dyR>2  =  A2  cos(g(t—r))  sin($R(t))  dt  +  dn2 


(2-43 


dyR>3  =  A3  sin(e(t-T))  sin(<jiR(t))  dt  +  dn3 


(2-44 


where  t  is  given  by 


T  =  ~  |xT(t-T)  "  XR|  , 


(2-45 


and  g  is  given  by 


'(8(0) 


xT(t)-xR 


sin(g(t ) )  |xx(t)-xR| 


(2-46 


In  [6], [7)  we  used  a  similar  model  that  approximated  t  =  0  in  Eqs.  2-40,  2-43 
2-44,  and  2-45,  and  ignored  the  (1-t)^^  factor  in  Eq.  2-41.  This  gives  the 


approximate  model 


(2-47 


1  — T  = 


<xT(t)  -  xR,xT(t)> 
c  |xT(t)  -  XR| 


(2-48 


d<J>R  =  f(l-i)dt  +  a  dw 


(2-49 


dyR  1  =  Ai  sinf $R(  t))  dt  +  dn^ 


(  2-5C 


a 
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dyR>2  =  A2  cos(B(t)J  sin($R(tj)  dt  +  dn£ 


(2-51 


dyK>3  =  A3  s in( g( t )  )  sin($R(t))  dt  +  dn3 


(2-52 


We  will  use  this  example  to  motivate  our  study  of  filter  architectures  and 


time-scale  approximations  in  subsequent  sections  of  this  report.  In  the  next 


section  we  will  describe  perturbation  parameters  that  can  be  introduced  into 


this  model  and  which  we  will  exploit  to  obtain  reduced  order  approximations 


of  optimal  filters  for  problems  such  as  this. 
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SECTION  3 

STOCHASTIC  ESTIMATION  THEORY 

3.1  INTRODUCTION 

The  first  part  of  this  section  describes  some  special  features  of  the 
passive  acoustic  tracking  model  that  we  can  exploit  to  design  approximate 
optimal  filters.  These  features  are  not  limited  to  acoustic  tracking  problems 
and  they  are  present  in  many  other  signal  processing  and  estimation  problems. 
For  this  reason  we  will  formulate  a  general  class  of  stochastic  perturbation 
problems  which  will  form  the  basis  of  our  research.  Our  work  relies  on  the 
methods  of  stochastic  differential  equations  and  nonlinear  filtering  theory 
[8]  .  The  second  part  of  this  section  reviews  the  basic  models  and  results  of 
this  theory. 

3.2  FEATURES  OF  ACOUSTIC  TRACKING  MODELS 

The  example  model  described  in  subsection  2.7  has  a  general  quasilinear 
form  described  by  three  coupled  stochastic  differential  equations: 

dx  =  Fx  dt  +  G  dw  (3-1) 

dz  =  A(x)z  dt  +  B(x)  du  (3-2) 

dy  =  C(x)z  dt  +  D(x)  dv  (3-3) 

In  these  equations  w,  u,  v  are  assumed  to  be  independent  Brownian  motion 
processes.  The  first  equation  (Eq.  3-1)  describes  the  target  dynamics  and 
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the  second  two  equations  (Eqs.  3-2  and  3-3)  describe  the  dynamics  of  the 
observed  signal  y(t)  and  its  dependence  on  the  target  state  x(t).  The  matrix 
coefficients  A,  B,  C,  and  D  depend  on  the  target  state  x  in  order  to  account 
for  such  effects  as  Doppler  shifted  frequency,  directivity  of  the  sensor,  and 
aspect  dependence  of  the  target  emissions. 

Signal  processing  and  tracking  problems  often  have  a  signal  process  y(t) 
with  a  shorter  time  constant  than  the  target  process  x(t).  This  is  the  case 
in  passive  acoustic  tracking  where  the  target  dynamics  are  very  slow  relative 
to  the  acoustic  signal  dynamics.  We  can  make  this  time  scale  explicit  by  per¬ 
turbing  the  stochastic  differential  equations  as  follows. 


dx  =  Fx  dt  +  G  dw 

(3-4) 

e*dz  =  A(x)z  dt  +  du 

(3-5) 

e*dy  =  C(x)z  dt  +  e^/2.d(x)  dv 

(3-6) 

The  parameter  e  is  proportional  to  the  time  constant  of  the  signal  process. 
Note  that  the  square  root  arises  because  we  are  using  white  noise  (du  and 

dv)  to  drive  these  stochastic  differential  equations;  speeding  up  a  Brownian 
motion  is  mathematically  equivalent  to  scaling  its  magnitude. 

The  equations  in  Eqs.  3-5  and  3-6  are  singularly  perturbed.  That  is, 
their  small  e  *  0  behavior  cannot  be  approximated  by  simply  setting  e  =  0. 
Nevertheless,  there  are  ways  to  obtain  good  approximations  to  the  solution 
of  singularly-perturbed  problems  for  small  e.  There  is  a  vast  literature  on 
singularly-perturbed  differential  equations.  The  papers  [9], [10)  review  the 
work  relevant  to  control  and  estimation  applications.  We  will  review  work 


specifically  relevant  to  our  estimation  problem  in  the  next  section. 
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As  e  approaches  0  in  Eqs.  3-5  and  3-6,  the  rate  of  information  increases 
to  infinity.  For  example,  suppose  there  is  no  z  process  and  we  have  only  the 
two  simpler  equations 


dx  =  Fx  dt  +  G  dw  , 


e*dy  =  C(x)  dt  +  e^/^»D(x)  dv  , 


(3-7) 

(3-8) 


where  y  is  measured  and  we  want  to  estimate  x.  This  estimation  problem  is 
equivalent  to  one  with  equations 


BS  S 


•v . 


dx  =  Fx  dt  +  G  dw  , 


dy  =  C(x)  dt  +  e^^*D(x)  dv 


(3-9) 

(3-10) 


This  is  an  example  of  an  almost  singular  filtering  problem,  which  we  will  dis¬ 
cuss  in  Section  4.  Note  that  the  signal-to-noise  ratio  (Eq.  3-10)  is  propor¬ 
tional  to  Thus,  as  the  signal  process  becomes  faster  relative  to  the 

target  process,  the  SNR  of  the  signal  becomes  larger. 

Passive  acoustic  signals  typically  have  low  SNR,  and  a  model  which  pre¬ 
dicts  increasing  SNR  is  not  realistic.  Thus,  it  is  necessary  to  modify  the 
perturbation  above  to  control  the  SNR.  Note  that  the  noise  coefficients  B 
and  D  in  Eqs.  3-5  and  3-6  are  likely  to  be  large  and  we  cannot  let  the  time 
scale  e  be  arbitrarily  small  independent  of  these  noise  factors.  Thus,  we 
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If  8l(e)  =  g2(e)  =  e^^,  we  have  the  pure  time-scale  perturbation.  We  can 
account  for  greater  noise  by  allowing  different  functions  of  e.  For  example, 
g2(e)  =  1  causes  the  SNR  to  remain  constant  as  e  +0,  and  g2(  e)  =  causes 

SNR  to  decrease  to  0  as  £  ♦  0.  In  Section  4  we  will  survey  previous  work  on 
perturbed  filtering  problems  with  an  eye  to  different  possible  choices  for 
81»82*  Before  doing  that,  let  us  first  review  the  basic  results  of  nonlinear 
filtering  theory  which  is  the  basis  of  our  approach. 

3.3  FILTERING  MODELS  AND  EQUATIONS 

The  model  of  passive  acoustic  tracking  described  above  is  an  example  of 
a  general  class  of  nonlinear,  continuous  state,  continuous  time  processes 
driven  by  white  noise.  This  class  has  the  generic  form 

dx  =  f(x)  dt  +  g(x)  dw  (3-13) 

dy  =  h(x)  dt  +  dv  (3-14) 

where  w  and  v  are  independent  multidimensional  Brownian  motion  processes.  The 
filtering  problem  is  to  estimate  x(t)  given  the  measurements  y(s)  for  0<s<t. 
The  filtering  equations  in  the  general  nonlinear  case  are  expressed  are 
expressed  as  follows.  Let  $  be  a  function  of  the  state  variable  x.  Define 
Trt(i}>)  to  be  the  conditional  expectation  of  ij>(x(t))  given  the  measurements  up 
to  time  t.  The  filtering  equation  expresses  nt(4>)  in  terms  of  a  stochastic 
differential  equations  as  follows  [11], [12]: 

dnt((j>)  =  it t ( L<+> )  +  [ tt t C h<J> )  -  ut(h)irt($)]  dvt  (3-15) 

where  vt  is  the  innovations  process  satisfying  the  stochastic  differential 
equation 
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dvt  =  dy(t)  -  ut(h)dt  (3-16) 

and  L  is  the  infinitesimal  generator  of  the  process  x(t)  [11].  In  all  but  a 
few  cases,  the  solution  of  Eq.  3-15  requires  an  infinite  dimensional  calcula¬ 
tion.  Nevertheless,  there  are  several  types  of  additional  models  which  have 
finite  dimensional  solutions  and  that  can  be  used  to  help  gain  insight  into 
the  essential  behavior  of  the  filtering  problem.  In  this  subsection  we  will 
describe  these  models  and  the  related  filtering  equations  that  apply  to  them. 

The  simplest  case  is  the  familiar  linear  Gaussian  model  which  has 
equat ions 

dx  =  Ax  dt  +  B  dw  (3-17) 

dy  =  Cx  dt  +  D  dv  .  (3-18) 

In  this  case  w  and  v  are  assumed  to  be  independent  Brownian  motion  processes. 
The  optimal  filter  in  this  case  is  the  well-known  Kalman-Bucy  filter  [8]  which 
expresses  the  conditional  probability  density  of  x  in  terras  of  its  mean  x  and 
covariance  P.  These  satisfy  the  equations 

dx  =  Ax  dt  +  P(DRDT)_1CT(dy  -  Cxdt)  ,  (3-19) 

dP  =  (AP  +  PAt  +  BQBt  -  PCt(DRDt)-1CP)  dt  .  (3-20) 

Note  that  Eq.  3-19  is  a  stochastic  differential  equation  but  that  Eq.  3-20  is 
purely  deterministic. 

Linear  Gaussian  models  are  useful  because  they  can  often  be  used  in 

practice  to  solve  nonlinear  problems  by  linearization.  Note  that  the  acoustic 


tracking  model  is  quasilinear  as  indicated  in  Eqs.  3-1,  3-2,  and  3-3,  and 
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Linearization  techniques  may  be  helpful  in  this  case.  Nevertheless,  linear 
models  and  methods  are  limited  and  are  known  to  perform  poorly  in  nonlinear 
problems  with  low  SNR.  Furthermore,  methods  for  analyzing  linear  problems 
center  around  the  analysis  of  the  Riccati  equation  (Eq.  3-20)  and  do  not  gen¬ 
eralize  to  nonlinear  problems.  For  this  reason  we  have  also  chosen  two  types 
of  discrete  state  problems  to  analyze.  In  these  problems  x(t)  is  a  finite- 
state  continuous-time  Markov  process  and  the  filtering  equations  are  a  finite 
dimensional  system  of  stochastic  differential  equations. 

A  finite-state  Markov  process  is  described  in  terms  of  its  transition 
rate  raatrtix  A,  where  the  elements  1(^2  I  Cl)  A  are  the  transition  rates 

Pr{x(t+dt)  =  ?2  1  x(  t  )  =  Cl }  =  +  x^2|Cl)*dt  +  o(dt).  (3-21) 

If  pc  denotes  the  vector  of  probabilities  Pr{x(t)=£),  then  pt  satisfies  the 
linear  differential  equation 

dpt  =  Apt  dt  .  (3-22) 

In  the  first  discrete  model  we  use  the  same  measurement  equations  as 
Eq.  3-14,  and  it  is  discussed  in  [11]  —  [13]  .  The  filtering  equation  (Eq.  3-15) 
still  applies,  but  it  is  finite-dimensional  in  this  case.  Let  irt  denote  the 
vector  of  conditional  probabilities  Pr {x( t )=£ | y(s ) ,0<s<t } .  Then  satisfies 
the  stochastic  differential  equation 

dirc  =  Att t  +  [h*nt  -  tt t^hTi t ]  dv^  (3-23) 

where  h  is  the  vector  with  components  h(£)>  h*  is  the  diagnonal  matrix  with 
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The  second  type  of  discrete  model  we  will  investigate  has  discrete  mea¬ 
surements  as  well  as  a  discrete  state.  In  this  model  y(t)  is  given  by 


y(t)  =  h  ( x( t ) )  , 


(3-25 


and  we  call  this  a  partially  observed  Markov  process.  In  this  case  the  fil¬ 
tering  equation  is  a  finite-dimensional  stochastic  differential  equation 
driven  by  a  jump  process  Jt  (which  is  the  number  of  jumps  of  y(s)  in  the 
interval  [0,t]): 


diT(-(5)  =  Ft  dt  +  Gc  dJj- 


(3-26 


where  Fc  and  Gt  are  given  by 


where 


Ft  =  ^h(  5  )y  ( t )  *  [  ^  K€|C')irt(C,)-*t(0£  ^(y(  t )  |  £*)■»»,;(  C '  )]  (3-27 

V  V 


x(y(t)U)  =  Z  X(C’|0  , 

h(c')-y(t) 


z  xUlC'Jn-U') 

V 

=  6h(Oy(t)  *  — ; - — - -  “  "t-U) 

z  x(y(t)  |  c  )iTt-( C  ) 

C 


(3-28 


Appendix  B  derives  this  filtering  equation  and  discusses  several  variations 
of  the  model  and  of  the  filtering  equation  that  will  be  used  in  studying  this 
type  of  problem. 
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SECTION  4 

FILTERING  PROBLEMS  WITH  TIME  SCALES 


4.1  INTRODUCTION 

In  this  section  we  review  previous  work  on  stochastic  filtering  problems 
with  multiple  time  scales  that  has  some  relevance  to  our  problem.  Although 
no  previous  work  addresses  the  problem  of  interest  to  us,  some  of  this  work  is 
related  and  the  methods  used  serve  as  a  point  of  departure  for  our  research. 
Subsection  4.2  reviews  work  on  problems  of  estimation  when  some  state  compo¬ 
nents  are  known  exactly  (the  singular  estimation  problem)  or  very  accurately 
(the  almost  singular  estimation  problem).  Multiple  time  scales  arise  in  the 
solution  of  almost  singular  estimation  problems,  and  singular  perturbation 
methods  can  be  used  to  solve  such  problems.  As  noted  in  the  previous  section, 
the  almost  singular  estimation  problem  is  equivalent  to  a  simple  variation  of 
one  of  our  models.  Thus,  both  the  results  and  the  methods  are  of  interest. 
Subsection  4.3  discusses  work  on  estimation  problems  with  natural  time  scales 
—  that  is,  problems  in  which  the  time  scale  is  introduced  explicitly  as  a 
singularly-perturbed  filtering  problem.  Although  the  models  used  in  this 
work  do  not  quite  include  the  one  we  have  described  for  the  passive  acoustic 
tracking  problem,  they  are  closely  related  and  the  methods  of  analysis  are  of 
interest. 
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4.2  SINGULAR  AND  NEARLY-SINGULAR  FILTERING  PROBLEMS 
4.2.1  Introduction 

The  problem  of  interest  In  our  work  is  characterized  by  a  very  high 
intensity  for  the  measurement  noise,  but  with  a  long  time  horizon  to  perform 
tracking,  so  many  measurements  may  be  taken.  Ideally,  we  expect  a  tradeoff 
between  the  poor  measurement  quality  and  the  number  of  measurements  which  are 
available.  Since  this  is  the  main  distinguishing  feature  of  the  problem  from 
other  nonlinear  filtering  problems,  it  is  papers  in  this  area  that  are  most 
thoroughly  discussed. 

The  problem  of  poor  measurements  over  a  long  time  horizon  has  received 
little,  if  any,  attention  in  the  literature.  the  related  problem  of  estima¬ 
tion  with  good  measurements  over  a  very  short  period  of  time  has  been  dealt 
with  extensively.  In  addition,  the  dual  to  this  estimation  problem,  the  cheap 
control  problem,  has  received  significant  attention. 


4.2.2  Singular  Estimation  and  Control 

The  limiting  case  of  estimation  in  low  noise  (control  with  small  input 
penalty)  is  the  situation  where  noise  intensity  is  zero  (penalty  associated 
with  the  inputs  is  zero).  These  problems  are  known  as  the  singular  estima¬ 
tion  and  singular  control  problems,  respectively.  They  have  been  examined 
most  extensively  for  linear  Gaussian  models. 

For  the  case  of  linear  systems  with  a  continuous  state  space,  Willems, 
Kitapci  and  Silverman  [14]  investigated  the  singular  control  problem  while 
Schumacher  [15]  examines  singular  filtering.  Both  papers  deal  with  systems 
that  can  be  described  in  the  form: 
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some  of  the  coatroLs  are  associated  with  zero  penalty.  Therefore,  there  are 
two  subspaces  for  u,  one  in  which  infinite  magnitude  inputs  such  as  impulses 
are  allowed  (zero  cost  portion)  and  the  other  where  regular  (finite  magnitude) 
inputs  only  are  possible.  The  state  space  is  divided  into  subspaces  such  as 
the  output  nulling  subspace,  controllable  subspaces,  almost  nulling  subspaces 
and  combinations  thereof.  The  paper  shows  that  the  optimal  control  consists 
of  a  set  of  inputs  which  forces  the  state  to  jump  to  the  subspace  controllable 
by  regular  inputs.  The  subsequent  trajectory  of  the  state  remains  within  the 
subspace . 

This  compares  to  the  filtering  problem  of  Schumacher  where  the  uncor¬ 
rupted  subspace  was  estimated  exactly  and  instantly,  while  the  variables  in 
the  remaining  subspace  were  estimated  over  a  finite  time  interval,  with  imper¬ 
fect  accuracy,  by  linear  filtering  techniques. 

The  singular  filtering  problem  for  discrete  time  linear  systems  has  been 
addressed  by  Shaked  [16].  The  system  investigated  is  of  the  form 

=  Axi  +  Bw| 
yi  =  Cxi  +  vt 

where  E[v^Vj]  =  r6j[ j  and  r  +  0. 

The  paper  distinguishes  between  the  cases  of  uniform  and  nonuniform  rank. 
A  uniform  rank  system  has  the  property 

CA^-B  =  0  for  i  =  0, 


and 


de  1 1  CA^-B  |  *  0 
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where  xj  are  the  states  which  can  be  estimated  quickly  and  yj  are  the  outputs 
which  are  corrupted  only  by  a  low-intensity  noise  process.  A  further  assump¬ 
tion  on  Bff(e)  and  Dff(e)  and  a  scaling  of  the  state  and  output  variables 
yields  a  system  of  the  form: 


e  1Aff  e  1/2Afs 
0(e1/2)  Ass 


The  resulting  perturbations  in  the  dynamics  of  the  system  allow  Krener  to 
approximate  the  solutions  to  the  entire  Riccati  equation.  He  solves  smaller- 
order  problems  by  expanding  the  solution  in  a  power  series  and  keeping  only 
leading-order  powers  of  e.  The  solution  demonstrates  that  there  are  two  rates 
associated  with  the  dynamics  of  the  Riccati  equation  when  some  states  are 
corrupted  by  small  noise. 

Krener  extends  the  problem  to  the  case  where 
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dXf  Aff  Afs  Xf  0  Bff  U  dwf 

=  dt  +  dt  + 

dxs  Asf  Asg  xs  «s(xf)  0  I  dws 


and  shows  that  the  optimal  filter  can  be  approximated  by  the  linearized  ver¬ 
sion,  where  as(xf)  is  replaced  by  as(xf)  and  Xf  is  the  estimate  of  the  vari¬ 
ables  which  can  be  quickly  estimated. 

The  cheap  control  problem  for  linear  systems  has  been  investigated  by 
Sannuti  [18]-[21].  In  [18]  and  [21]  specifically,  the  linear  regulator 
problem  is  considered  in  which  the  controls  are  penalized  by  an  amount  which 
is  0(e)  =  O(p^)  smaller  than  the  penalty  associated  with  outputs  of  equivalent 
magnitude.  Therefore  we  have  a  system  of  the  form: 


x  =  Fx  +  Gu 


y  =  Hx 


1  cf 

J  =  — 2~  xT  S(tf)x  +  j  |yTAy  +  y2uTgu  j  dt 


In  [22],  Sannuti  and  Wason  demonstrate  how  an  invertible  system  can  be 
put  Into  an  "almost  observable"  form  using  an  invertible  linear  transforma¬ 
tion.  The  system  can  be  written  as  the  following  when  in  this  form: 


xq  =  Aqoxq  +  A01xl 


x^,j—  y'  AijXj  +  D^u  ,  i  1 ,  • .  .  , k 
j=0 


xib  =  aixi  +  x1+1 


j  i  1  j  •  •  •  J  k.  1 


y  =  HxI 
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The  structure  generated  by  this  form  is  shown  in  a  block  diagram  in  Fig. 
4-1.  The  key  feature  to  this  structure  is  the  number  of  integrations  along 
each  path  from  input  to  output.  In  general,  there  are  paths  with  1,2,  ... 

K-l  and  K  integrators  in  series.  Given  the  representation  above  and  the  cost 
function,  Sannuti  shows  that  the  magnitude  of  the  optimal  controls  will  be 
0(y-l).  Furthermore,  the  high  gain  of  0(p~^)  will  be  evenly  distributed 
across  the  integrations  of  each  path. 

Therefore,  for  the  forward  path  with  three  integrators,  the  inputs  to  the 
integrators  will  be  0(p“l),  0(y-^/-),  and  as  shown  in  Fig.  4-2.  In 

the  additional  dynamics  for  xq,  the  integrator  input  is  the  same  order  as  y. 

To  understand  the  effect  on  the  dynamics  of  the  control  law,  we  recognize 
that  the  integrator  inputs  are  the  derivatives  of  the  state  variables,  so  that 
the  dynamics  are  clearly  speeded  up.  Sannuti  proceeds  to  scale  the  state 
variables  and  controls  so  that  the  new  inputs  to  the  integrators  are  all  of 
the  same  order.  For  the  case  of  a  three-integration  chain  we  can  consider  a 
simple  example. 


*1 

0 

1 

0 

*1 

0 

*2 

= 

0 

0 

1 

*2 

+ 

0 

*3 

0 

0 

0 

ro 

1 

Sannuti 


Optimal  Input  Across  the  Integration  Chain 


2M  ft 


*  .. 

•«  « 
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Under  Che  optimal  control  we  will  have: 


0(u)  =  0(irl ) 


0(x3)  =  0(u"2/3) 


0(x2)  =  OCii-l/-3) 


0(xj_)  =  0(y)  =  0(1) 


To  scale  all  the  derivatives  to  order  0(p  1/3)  we  let 


p 2 / 3 


yl/3  x2 


to  obtain: 


,1/3  i. 


0  1  O  x 


x2  =  0  0  1  x2  +  0u 


0  0  0  x3 


J  =  /  (  yy  +  uTu  )  dt  . 


Therefore  the  result  of  the  y2  t^e  original  cost  function  is  a  time  scaling 

of  u 1 / 3 .  In  the  case  where  there  are  k  integrations  in  series,  the  scaling 
will  be  u 1 / k •  Therefore,  for  systems  with  more  than  one  forward  path  (multi¬ 


ple  rank)  there  will  be  multiple  time  scales  on  which  system  behavior  occurs. 


For  the  additional  system, 


x  =  A01xl  +  A00x0 
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we  let 

K0  =  *0 

and  therefore  there  is  no  time  scaling  effect. 

Due  to  the  duality  between  estimation  and  control  problems,  we  can  expect 
a  similar  phenomena  to  occur  in  estimation  problems.  The  above  phenomena 
resulted  from  the  term  in  the  penalty  function.  The  dual  condition  for 

the  estimation  problem  would  be  that 

cov[v(t)v(s)]  =  jj^RdCt.s) 


which  corresponds  to  the  case  of  very  good  measurements  [23,  p.  270]. 

To  determine  the  behavior  of  the  estimation  problem,  we  start  by  w. rking 
with  general  control  and  estimation  problems,  and  generate  the  dual  of  the 
cheap  control  problem. 

CONTROL/ESTIMATION  DUALITY 

To  obtain  a  general  relationship  between  the  (nearly)  singular  control 
and  (nearly)  singular  filter  problems,  we  need  to  express  the  systems  in  the 
general  form: 


CONTROL 


ESTIMATION 


x 


Fx  +  Gu 


x  =  F^x  + 


y  =  Hx 


y  =  GTX 


V 


J 


x(tf)TS(tf )x(tf  ) 


yTAy  +  u^Bu  dt 


cov(w)  =  A  =  Q 
cov(v)  =  B  =  R 


Therefore,  given  the  estimation  problem: 
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x  =  Fx  +  Hw  ;  cov(w)  =  I 
y  =  Gx  +  v  cov(v)  =  pzI  , 

we  can  translate  into  the  control  problem: 

x  =  Ftx  +  Gtu  . 

y  =  htx 

J  =  /  yTy  + 

c0 

The  transformation  described  by  Sannuti  could  then  be  performed  so  that: 

x0  =  AOO^  +  Aoixi 

k 

xi  a  =  1  ^ijxj  +  ^iu  *  *  l,...,k 

j=0 

xib  =  aixl  +  xi+l  »  1  =  1 . k_1 

y  =  Hxl 

j  =  _L_  j  yTy  +  y2uTu  . 

^0 

Translating  back  into  the  estimation  context  we  obtain: 

x0  =  A00x0  +  l  AioXia 
1=1 

k 

*j  -  Aij*ia  +  *0-1)6 
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The  original  equations  for  the  control  problem  implied  the  structure  of 
Fig.  4-1,  while  those  for  the  filtering  problem  imply  the  structure  of  Fig. 
4-3.  Therefore,  we  find  that  in  the  dual  system,  the  form  of  the  transformed 
structure  is  the  same  (chains  of  integrators),  with  process  noise  replacing 
the  inputs  and  measurement  noise  added  to  the  outputs. 

The  only  minor  differences  are  that  the  input  and  output  matrices  have 
reversed  position  and  the  state  variables  have  assumed  a  transposed  position 
in  the  structure.  However,  since  the  variable  labeling  is  arbitrary,  we  can 
expect  results  for  the  filtering  problem  which  are  identical  in  form  to  those 
obtained  for  the  control  problem. 

To  demonstrate  the  implications  of  Sannuti’s  results  for  an  estimation 
problem,  consider  a  simple  example.  Given  that  the  special  form  required 
by  Sannuti  is  quite  similar  for  the  estimation  problem,  we  can  formulate  a 
problem  that  is  already  in  the  required  form.  Assume  for  simplicity  that  we 
have  a  similar  system  to  the  one  considered  in  the  control  example: 
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We  can 

write  the 

filtering  equations: 

x  =  Ax 

+  PCTp 

-2(?-Cx) 

P  =  AP 

+  PAt 

+  BBT  -  PCTCPu“2 

which 

for  our  example  become: 

• 

A 

*1  = 

x2  +  U~ 

2Pll(y  -  *i) 

A 

x2  = 

x3  +  y~ 

A 

2Pl2(y  -  x3) 

A 

x3  = 

V~ 

2P13(y  -  x3)  . 

In  the 

steady  state, 

PlL  = 

2u5/3 

P22  =  3p 

p12  = 

2^/3 

P23  =  2y2/3 

PU  = 

u 

P33  =  2y1/3 

which 

result  from 

the  Riccati 

equations : 

Pli  = 

2P12  - 

P_2pll2 

?12  = 

P22  + 

Pl3  -  y-2PnP12 

p13  = 

p23  “ 

y_2PnPl3 

p22  “ 

2p23  _ 

y-2Pl22 

P23  = 

P33  - 

y“2Pl2P13 

P33  = 

1  - 

y-2Pl32  . 

Now  consider  the 

transformation : 

w*V*Ir’V*3 
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x^  =  p  5/6x^ 


X2  =  p-^/2x2  y  =  p-5/6y 


X3  =  p  ^/^X3 


Then  the  Riccati  equations  become: 


2Pi2  -  Pn2 
P22  +  Pi 3  “  P11P12 
P23  -  PUP13 
2P23  -  P122 
P33  -  P12P13 
1  -  P13 

=  p5/3pn 
=  p^/3p12 


p2/3P23 

pl/3p33 


Solving  the  algebraic  Riccati  equations,  we  get 


pll  =  2  p23  =  3 

p12  =  2  p23  =  2 

P1 3  =  1  p33  =  2 


which  upon  transformation  back  to  P  yields  our  original  result. 


<*v 


•< 


ft 


3 


t 


$ 


;; 
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The  filtering  equations  become: 


— 

— 

*1 

0 

1 

0 

*1 

*11 

pl/3 

• 

A 

x2 

= 

0 

0 

1 

*2 

+ 

Pi  2 

*3 

0 

0 

0 

LO 

L 

*13 

(y-xi)  . 


This  result  clearly  resembles  that  obtained  for  the  control  problem  (time¬ 
scaling  of  the  order  p^-*). 

From  this  new  form  of  equations  we  can  easily  see  the  effects  of  the  low 
measurement  noise. 

1.  The  dynamic  behavior  of  the  Riccati  equation  is  "speeded  up," 
thereby  indicating  that  the  filter  reaches  steady-state  much 
more  quickly. 

2.  The  filter  dynamics  themselves  are  speeded  up. 

3.  The  increased  rate  of  filter  convergence  is  given  by  a  factor 
pl/3  with  three  integrations.  This  would  be  p^-'k  for  k  inte¬ 
grations,  paralleling  the  cheap  control  case. 

4.  The  ending  steady-state  covariances  are  decreased  much  more  for 
state  variables  that  are  close  to  the  output  rather  than  the 
input  (see  Fig.  4-4).  Therefore,  the  difference  in  covariance 
of  the  process  and  measurement  noise  intensities  is  spread 
evenly  across  the  integrators,  so  low  measurement  noise  may 

not  be  extremely  useful  if  we  are  estimating  state  variables 
in  a  long  sequence  of  integrations. 

5.  For  the  opposite  case,  when  p  is  very  large  instead  of  very 
small,  the  trend  is  reversed.  We  will  be  capable,  compara¬ 
tively,  of  estimating  variables  close  to  the  process,  while 
variables  that  are  close  to  the  measurement  in  the  chain  may 
not  be  possible  to  estimate. 

6.  For  the  reversed  case  (p  large),  the  convergence  of  the  covar¬ 
iance  (and  filter)  to  the  steady  state  will  be  very  slow. 
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NEARLY-SINGULAR  NONLINEAR  ESTIMATION 

The  filtering  problem  for  this  type  of  model  is  discussed  by  Katzur, 
Bobrovsky,  and  Schuss  [24].  A  system  of  the  form: 


dxt  =  m(xt)dt  +  odwt 
dyt  =  h(xt)dt  +  edvt 


(linear  observations) 


is  discussed,  where  h(xt)  is  linear  in  xt ,  wt  and  vt  are  unit  intensity 
Brownian  motion  processes  and  e  is  a  small  positive  parameter.  The  propa¬ 
gation  of  the  conditional  density  function  in  the  form  of  Kushner's  equation 
is  discussed.  This  yields  an  equation  of  the  form: 


dp  =  A*pdt  H - (h-h)  (dy  -  h(xt))  dt 

e2 

which  is  approximated  using  a  power  series  expansion  in  e.  A  similar  proce¬ 
dure  is  used  for  Zakai’s  equation  for  the  unnormalized  conditional  distribu¬ 


tion  in  the  case  where  h(xt)  =  xt  (dealing  with  scalars). 

An  estimate  for  x  (leading  terra)  is  then  obtained  as  the  solution  to  the 
differential  equation 


dx  =  -  a  — +  — —  dy  =  —2—  (dy-x) 
e  e  e 


Additional  terms  are  obtained  in  higher  orders  of  e  to  improve  accuracy. 
These  additional  terms  are  found  as  functions  of  Xq  where  Xq  is  the  solution 
to  the  differential  equation  above.  This  yields,  to  first  order  in  e, 


*  =  XS  +  -f"  ra(xo}  +  *  •  • 


w 


m 
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The  advantages  of  this  formulation  is  that  a  fast  analog  processor  can 
perform  the  averaging  process  required  in  the  calculation  of  x^ .  After  this 
smoothing  integration,  a  slower  sampling  can  be  performed,  followed  by  the 
nonlinear  approximation  above.  Therefore,  this  technique  naturally  yields  an 
estimator  structure  with  a  fast  front  end"  and  a  slow  "back  end"  processor. 

4.3  FILTERING  WITH  DYNAMICS  POSSESSING  NATURAL  TIME  SCALES 

A  second  type  of  perturbed  filtering  problem  which  is  of  interest  is  the 
case  where  the  perturbation  parameter  is  in  the  dynamics  of  the  system  instead 
of  the  noise  magnitudes.  This  problem  and  its  control  dual  have  been  inves¬ 
tigated  by  Chow  and  Kokotovic  [25],  Haddad  [26],  and  Marchetti  [27], [28]. 

In  these  cases  the  system  displays  behavior  on  multiple  time  scales  before 
the  noise  levels  or  control  penalties  are  considered. 

4.3.1  Continuous-State  Problems 

Chow  and  Kokotovic  [25]  analyze  a  control  problem  which  is  nonlinear  and 
displays  behavior  on  two  time  scales.  The  model  employed  is  of  the  form 

x  =  a^(x)  +  A]^(x)z  +  B^(x)  u 

ez  =  a2(x)  +  A2<x)z  +  B2(x)  u 

CO 

J  =  /  [P(x)+S'(x)z  +  z'Q(})z  +  u'R(x)u]  dt 
0 

In  their  analysis,  they  attempt  to  obtain  an  approximate,  but  lower 
dimension  solution  to  this  problem.  The  approach  that  is  used  is  to  initially 
set  e=0,  solve  for  z  and  then  determine  the  optimal  slow  control  for  the 


reduced-order  system.  The  fast  subproblera  is  then  solved  using  the  fast 
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subproblem  Is  then  solved  using  the  fast  dynamics  and  the  solution  to  the  Slow 
control  problem.  The  two  controls  are  then  added  together  to  obtain: 

uc  =  us  +  ut  . 

where  us  is  the  optimal  control  for  e=0  and  uf  is  a  correction  term  required 
because  e*0.  The  dual  estimation  problem  is  quite  similar.  The  extra  diffi¬ 
culty  that  arises  for  the  estimation  problem  is  that  we  also  have  white  noise 
present,  which  is  fast  when  viewed  at  any  time  scale.  The  dual  problem  is, 
however,  discussed  by  Haddad  [26]  with  results  similar  to  those  for  the  con¬ 
trol  problem. 

Haddad  considers  a  system  of  the  form: 

x  =  A^x  +  A]^2  +  B^u 
ez  =  A21X  +  A22Z  +  B2U 
y  =  Cjx  +  C2z  +  v 

His  first  step  is  to  perform  a  linear  transformation  to  separate  the  fast  and 
slow  parts  of  the  state  to  obtain: 

n  =  Aon  +  Bqu 

=  A2C  +  B2U  n(tQ)  =  no 

y  =  CQh  +  C2C  +  v  ~  SO  • 

As  in  Chow  and  Kokotovic's  solution  to  the  control  problem,  Haddad  splits 
the  solution  into  two  parts.  The  first  part  is  obtained  by  setting  e=0  and 


solving  for  F,  in  terms  of  u,  thereby  assuming  for  this  portion  that  £  is  a 
white-noise  process.  The  resulting  "slow"  solution  is  shown  to  be  valid  to 
0(z)  for  t  >  tg  +  u ,  where  p  is  a  small  positive  number.  For  the  interval 
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,  V 
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tg  <  t  <  tQ  +  pi,  an  additional  portion,  known  as  the  boundary  layer,  must  be 
added  to  the  solution. 

The  term  "boundary  layer"  refers  to  an  additional  part  of  the  solution 
which  is  an  adjustment  for  the  facts  that  we  have  initial  conditions  and  the 
£  process  is  not  actually  white  noise. 

The  resultant  filter  is  given  by: 


T)  =  Aon  +  (Pi Co'  +  Pi2c2' )R-1-(y  -  Con  -  C2C) 
vl  =  A2I  +  (eP^Cq'  +  P2C2')R-1(y  -  Cqp  “  C2C) 


P2  and  P|2  are  given  by 


P2(t)  =  P^TT  +  P2(  6 )  +  0(e) 


Pl2(t)  =  Pl2(t>  +  Pl2C  e>  +  0(e) 


where  0  =  t/e  and  P2(0),  Pi2(0)  are  corrections  for  boundary  layer  effects. 

The  slow  mode  filter  may  be  implemented  without  the  correction  terms  and  still 
be  valid  for  all  t  >  to  because  the  filter  dynamics  are  slow  and  therefore 
the  boundary  layer  time  interval  will  have  negligible  effect  on  its  output. 


4.3.2  Discrete-State  Problems 


Marchetti  [27]  deals  with  discrete-state  space  problems,  specifically 
Markov  chains.  The  system  dynamics  in  these  problems  are  singularly  perturbed. 
The  systems  dealt  with  are  of  the  form: 


dpte  =  (B+eA)ptc 

dyt  =  h(xtE)dt  +  dwte 
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where  p te  is  an  n-dimensional  column  vector  of  probabilities,  (B+eA)  is  a 
state  transition  matrix,  we  is  a  standard  Brownian  motion  process  and  h(xtE) 
is  a  function  of  the  state.  The  fact  that  the  transition  matrix  is  B+eA, 
will  yield  multiple  time-scale  behavior  for  the  system,  given  that  B  satis¬ 
fies  certain  conditions  on  its  eigenvalues.  Marchetti  derives  formulas  for 
propagating  the  Zakai  equation  by 

1.  Writing  pte  as  a  power  series  in  e, 

00 

Pt£  =  l  Pk(c>ek  • 
k=0 

2.  Substituting  in  the  Zakai  equation  and  equating  power  of  e. 

3.  Writing  the  equations  above  in  aggregate  and  decentralized 
forms,  thereby  reducing  the  order  of  the  systems  of  differential 
equations . 

Using  the  second  step  we  can  approximate  the  probability  mass  function 
for  the  states  of  the  system  to  any  order  in  e  that  we  desire;  however,  the 
approximation  will  only  be  valid  on  an  interval  [0,T].  In  [28]  a  variation 


on  this  is  developed  which  produces  approximations  over  intervals  of  length 
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SECTION  5 


FILTER  ARCHITECTURES  FOR  TIME-SCALE  APPROXIMATIONS 


5-1  COMPARISON  OF  FILTERING  TECHNIQUES  TO  THE  REQUIREMENTS  OF  OUR  PROBLEM 

From  our  original  discussion  we  are  interested  in  the  estimation  problem 


for  the  system 


dx  =  Ai(x)  +  B^( e)  dw| 

edz  =  A2(x)  +  B2(e)  dw2 

(e)dy  =  C(xi,z)  +  D(e)  dv 


The  noise  terms  have  magnitudes  which  are  functions  of  e  and  therefore  the 
system  is  in  the  nearly  singular  category  (in  addition  to  a  perturbation  in 
the  dynamics).  In  particular,  we  are  interested  in  the  case  of  poor  measure¬ 
ments  so  we  might  set  D(e)  =  /e  I,  and  0(B^(e))  =  0(1),  0(B2(e))  =  0(/e~). 

The  problem  considered  by  Katzur  et  al.  [24]  falls  into  the  same  problem 
category,  but  deals  only  with  the  scalar  case  and  a  system  with  one  time  scale 
In  addition,  it  deals  with  small  noise  instead  of  large  noise.  Finally,  the 
nonlinearities  occur  only  in  the  state  dynamics,  not  the  observations. 

The  work  by  Haddad  [26]  is  similar  to  our  problem  except  that  ours  is 
nonlinear  and  he  does  not  assume  e  dependence  of  the  noise  magnitudes.  In 
addition,  his  measurements  are  a  function  of  both  the  fast  and  slow  processes, 
whereas  we  are  measuring  only  the  fast  process. 
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Krener  [17]  does  not  dea1  with  more  than  one  natural  time  scale  and 
assumes  that  the  e  dependence  of  the  noise  produces  small  noise  instead  of 
large  noise.  Chow  and  Kokotovic  [25]  deal  with  a  system  with  natural  time 
scales  and  nonlinearities.  However,  the  problem  does  not  contain  e  dependen¬ 
cies  in  the  cost  function  which  would  become  noise  magnitudes  in  the  filtering 
dual . 

Sannuti  does  not  assume  natural  time  scales  for  the  system,  but  does  deal 
with  time  scales  once  they  arise  from  perturbation  parameters  in  the  penalty 
function.  The  dual  of  his  assumptions,  however,  are  that  the  noise  intensity 
is  low. 

Finally,  Marchetti  assumes  perturbation  parameters  in  the  system  dynamics 
The  noise  is  also  assumed  to  be  bad  in  the  case  of  [28].  In  [27]  he  handles 
an  approximation  that  is  valid  only  on  [0,T],  but  [28]  improves  this  to  the 
i nterval  [0,1/e]. 

In  general,  the  exact  problem  that  we  are  interested  in  has  not  been 
handled  in  the  literature  due  to  a  combination  of  the  following  features: 

1.  Measurements  are  nonlinear  functions  of  the  state. 

2.  Perturbations  exist  in  both  the  dynamics  and  the  noise 
magnitudes . 

3.  We  are  interested  in  the  high-noise  case,  not  the  low-noise 
case . 

Regardless  of  the  exact  applicability  of  the  techniques  covered  in  the 
literature,  we  can  examine  the  structures  of  the  processors  that  are  implied 


by  each  of  the  techniques  (or  corresponding  duals). 
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5.2  FILTER  ARCHITECTURES  OF  EXISTING  TECHNIQUES 


5.2.1  Fast  Front  End,  Slow  Back  End  With  Feed-Forward  and  Feedback  (Fig.  5-1) 


This  structure  was  motivated  by  the  structures  of  existing  processors. 


Not  all  existing  structures  have  the  feedback  present,  and  depending  on  the 


problem  conditions,  it  may  or  may  not  be  very  useful. 


5.2.2  Slow  Processor  with  Correction  (Fig.  5-2) 


This  structure  is  motivated  by  the  work  of  Chow  and  Kokotovic  [25]  and 


Haddad  [26] .  In  both  cases,  the  slow  processor  is  the  processor  that  would 


be  obtained  if  e=0,  with  the  fast  dynamics  assumed  arbitrarily  fast.  The  slow 


results  are  then  used,  in  combination  with  the  model  of  the  fast  dynamics,  to 


determine  the  estimates  of  the  fast  variables. 


5.2.3  Fast  Estimates  Followed  by  Slow  (Fig.  5-3) 


This  structure  is  motivated  by  problems  with  e  in  the  cost  function  or 


noise  magnitude  (Schumacher,  Krener,  Willems).  The  parameter  is  taken  into 


the  system  dynamics,  resulting  in  a  set  of  variables  that  may  be  estimated 


quickly.  Once  these  estimates  are  determined  (i.e.,  steady  state  is  reached) 


they  can  be  used  to  help  in  the  estimation  of  variables  that  are  corrupted  by 


greater  noise. 


5.2.4  Steady-State  Solution  with  Boundary  Layer  (Fig.  5-4) 


This  structure  was  motivated  by  Sannuti  and  the  cheap  control  problem. 


In  this  case  a  control  was  found  which  was  valid  on  [0,®] ,  but  a  correction 


had  to  be  made  over  the  initial  time  interval  [0,e]  to  correct  for  the  initial 


conditions.  This  correction  was  described  as  a  "boundary  layer." 
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5.2.5  Multiple  Lower-Order  Filter  (Fig.  5-5) 

This  structure  was  motivated  by  Marchetti's  paper  on  the  propagation  of 
the  Zakai  equation  for  a  Markov  chain.  Multiple  lower-order  filters  perform 
calculations  which  are  combined  to  determine  an  approximate  result  for  the 
normalized  probability  mass  function. 

5.2.6  Fast  Analog  Processor  Followed  by  Slow  Digital  Processor  (Fig.  5-6) 
This  structure  was  proposed  by  Katzur  et  al.  In  [24]  they  showed  that 

for  a  nonlinear  system  and  small  measurement  noise,  a  linear  analog  filter 


could  reduce  the  sample  frequency  prior  to  nonlinear  manipulations 
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Figure  5-4.  Steady-State  Solution  with  Boundary  Layer 
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Figure  5-5.  Multiple  Lower-Order  Filters 
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Figure  5-6.  Fast  Analog  Processor  Followed  by  Slow  Digital 
Processor  (Nonlinear  Operations) 
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SECTION  6 


FILTERING  PROBLEM  FORMULATIONS 


In  this  section  we  describe  both  the  specific  nonlinear  filtering  prob¬ 
lems  selected  for  analysis  and  the  conjectured  behavior  and  filter  architec¬ 
ture  that  these  models  suggest.  The  key  features  that  we  wished  to  capture 
in  the  problems  chosen  for  study  are: 

•  The  system  models  should  possess  both  fast  and  slow  dynamics 

•  Direct  measurements  of  only  the  fast  variables  are  made,  and 
these  measurements  may  be  of  poor  quality 

•  The  principal  objective  is  to  estimate  the  slow  variables 

•  The  problems  chosen  should  be  as  simple  as  possible  in  order 

to  facilitate  analysis  and  our  ability  to  gain  insight  into  the 
character  of  such  estimation  problems. 

As  discussed  previously,  the  choice  of  these  features  was  motivated  by  desire 
to  capture  some  of  the  critical  aspects  of  passive  acoustic  tracking  problems. 


6.1  EXAMPLES  SELECTED  FOR  ANALYSIS 


v  o: 


In  this  section  we  describe  two  closely  related  examples  that  have  formed 
the  focus  of  our  detailed  study.  Both  involve  estimation  for  a  particular 


finite-state  Markov  process  possessing  two  time  scales. 


6.1.1  Measurements  Corrupted  by  White  Noise 


>+  - 

s.-" 

$  . 


In  this  problem  we  are  interested  in  estimating  the  state  of  the  4-state, 
continuous-time  finite  state  Markov  process  p(t)  whose  transition  behavior  is 
depicted  in  Fig.  6-1.  Here  e  is  a  small  parameter.  This  process  spends  most 


ra  - 
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A.1  X2  X3  X4 


R-5558 


Figure  6-1.  A  4-State  Markov  Process 


of  its  time  jumping  between  1  and  2  or  between  3  and  4  (the  fast  dynamics) 
and  occasionally  jumps  between  the  left  {1,2}  and  the  right  {3,4}  (the  slow 
dynamics).  We  are  ultimately  interested  in  tracking  the  slow  behavior  given 
noisy  measurements  of  only  the  fast  dynamics.  The  measurement  model  we  use 
to  capture  this  is 


dy(t)  =  g(e)  h(p(t))dt  +  dv( t ) 


(6-1) 


where  v(t)  is  a  standard  Brownian  motion,  independent  of  p(t).  The  quantity 
g(e)  is  a  function  of  e  which  is  used  to  model  the  relative  quality  of  our 
measurements.  Its  magnitude  relative  to  e,  which  directly  controls  the  SNR 
of  the  measurements,  is  of  central  importance  in  our  analysis.  Finally,  to 
guarantee  that  our  measurements  provide  us  only  with  direct  information  about 
the  fast  dynamics,  we  assume  that 


h(l)  =  h( 3)  =  a 
h ( 2 )  =  h(4)  =  B 


(6-2) 
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i.e.,  that  our  measurements  indicate  only  if  the  process  is  on  the  top  {1,3} 
or  the  bottom  of  {2,4}. 


6.1.2  Perfect  Measurements  with  Small  Differences  In  Rates 

In  this  second  model  we  consider  the  same  4-state  process  in  Fig.  6-1, 
but  in  this  case  we  assume  that  we  have  perfect  knowledge  of  whether  the  pro¬ 
cess  is  on  the  top  {1,3}  or  the  bottom  {3,4}.  In  order  to  capture  the  feature 
that  the  measurements  contain  only  weak,  indirect  information  about  left-right 
behavior,  we  assume  in  this  case  that 


A3  =  Ai  +  a  g(e) 
A4  =  A2  +  3  g(e) 


(6-3) 


6.2  CONJECTURED  ASYMPTOTIC  BEHAVIOR  AND  FILTER  ARCHITECTURES 

In  this  subsection  we  perform  some  initial  analysis  of  the  nonlinear 
filtering  problems  introduced  in  the  preceding  subsection.  The  end  results 
of  these  analyses  are  several  approximate  filter  structure  whose  properties 
and  performance  versus  the  optimal  solution  are  explored  in  subsequent 
sections . 


6.2.1  The  White  Measurement  Noise  Model 


Pi(t)  =  Pr[p(t)  =  i  ]  y(x) ,  x<t] 


(6-4) 


Then  from  standard  results  in  nonlinear  filtering  [13],  we  have  that 
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dPl(t)  =  [(-Xi  -  eui)Pi(t)  +  X2  P2 C t )  +  ep2  P3 ( c ) ] d t 
+  g( e) (a  -  h( t ) ]  p1(t)[dy(t)  -  g(e)  h(t)dtj 


dP2(c>  =  [Xi  Pl(t)  +  (-X2  -  £P3)  P2( c )  +  £P4  P4(t))dt 
+  g(e)[B  -  h(t) ]  p2(t)[dy(t)  -  g(e)  h(t)dt] 


dP3( t)  =  [em  Pi(t)  +  (-X3  -  eu2)P3(t)  +  x4  P4(t))dt 
+  g(e)[a  -  h(t) ]  p 3< t)  [dy(t)  -  g(e)  h(t)dt] 


dP4(t)  =  [£M3  P2(c)  +  x3  P3(0  +  (-X4  -  £P4)P4( t ) ]dt 
+  g(e)[S  “  h(t)]  P4(t)[dy(t)  -  g(e)  h(t)dt] 


where 


h(t)  =  a[pi(t)  +  P3( t )  ]  +  g[p2(t)  +  P4 ( t )  ] 


As  in  Harchetti's  work,  it  is  quite  useful  to  transform  coordinates  to  high 
light  explicitly  the  aggregate  variables  of  interest.  In  this  example,  one 
can  give  simple  explanations  for  the  several  variables  that  arise.  Specif i 
cally,  let 


PL(t)  =  Pi(t)  +  P2(t) 


pR(t)  =  P3<t)  +  P4(t) 


Sl(t)  =  pi(t) - pL(t) 

Xl  +  X2 


62(  c )  =  P3^t) 


pR(t) 
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Here  p^(t)  and  p^( t )  are  Che  probabilities  of  being  on  Che  left  and  right, 
respectively.  Also,  6 y ( t )  is  the  deviation  between  the  exact  probability  of 
being  in  state  1  and  its  approximation  obtained  by  multiplying  the  probability 
of  being  on  the  left  by  the  steady-state  probability  of  being  on  the  top  given 
the  process  is  on  the  left  and  no  left-right  transitions  can  occur.  An  anal¬ 
ogous  interpretation  can  be  given  to  62(t).  Note  that  one  might  expect  6^(0 
and  62(0  to  be  small  since  the  process  essentially  can  reach  steady-state 
between  left-right  transitions. 

If  we  transform  variables  according  to  Eq.  6-7  and  eliminate  p^(t)  by 
replacing  it  by  1  -  pL(t),  we  obtain  the  following  set  of  three  coupled  sto¬ 
chastic  differential  equations  for  the  exact  optimal  nonlinear  filter: 


dpL(t)  =  e [ Y2  “  Cyi  +  Y2)PL(t)ldc  +  £l-TU  6i(t)  +  n2  62C t > ] dt 
+  g(e)[fL  -  h( t ) ]  pL(t)dv(t)  +  g(e)  A6i(t)dv(t) 


(6-8a) 


dS^t)  =  -hi  6i(t)dt  +  e[n5  -  113  6i(t)  +  (n4  ~  ns)?1"^)  +  %  62C  t )  ]  dt 
+  g(e)Ul  pL(t)  +  ?2  <5l(t)  -  Mt)  6i(t)]dv(t) 


(6-8b) 


d6 2< c )  =  “A 2  $2( i )dt  +  e[ng  -  n 7  62(C)  +  (nio  ~  h8)PL(c)  +  H9  6i(t)]dt 
+  g(e)U3  -  C3  PL(t)  +  64  6 2C C )  “  h(t)  62(t)]dv(t) 


(6-8c ) 


where 


dv(t )  =  dy(t)  -  g(e)  h(t)dt 


(6-9a) 


h(t)  =  fR  +  (fL  -  fR)  pL(t)  +  A[ 6L( t )  +  62( t ) ] 


( 6-9b) 


and  the  various  constants  appearing  in  Eqs.  6-8  and  6-9  are 


s. 

V.  f. 


J 


•4  « 


ALPHATECH,  INC 


A  =  a  -  B 


A2  -  *3  +  A4 


ri 2  =  M2  "  M4 


A1  X2(m3  -  Ml) 


H4  = 


Ui  +  a2)2 


M2  A1  +  MA  A2 


H6  = 


Xl  +  X2 


X3  x2( U4  -  y2) 


n8  = 


( X3  +  A4) 2 


Ml  A2  *3  "  M3  A1  A4 


’no  = 


(Xl  +  X2)( X3  +  X4) 


a  X[  +  8  X2 


Xl  +  x2 


a  X3  +  8  X4 


X3  +  Xa 


aA  M2  +  a3  MS 


T2  = 


X3  +  Xa 


Al  =  Xi  +  X2 


ni  =  m  -  M3 


Ml  Xl  +  U3  x2 


Xl  +  x2 


M2  A1  aA  "  MA  A2  x3 


( Xi  +  X2)(  X3  +  Xa) 


M2  X3  +  m4  Xa 


X3  +  Xa 


Ml  a3  +  M3  X4 


X3  +  Xa 


Xi  X2(a  ~  B) 


(Xi  +  X2)2 


X3  Xa(  a  -  B) 


( X3  +  Xa)2 


X2  Ml  +  Xi  m3 


Xi  +  X2 


Xi  B  +  X2  a 


Xi  +  X2 


X3  B  +  X4  a 


X3  +  Xa 


Note  that  the  innovation  dv(t)  in  Eq .  6-5  is  a  standard  Brownian  motion 


nV.'V-'v, 
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Since  we  will  be  focusing  on  the  estimation  of  the  slow  lef t-to-right 
transitions,  it  is  useful  to  change  time  scales  to  that  at  which  these  tran¬ 
sitions  occur,  i.e.,  we  will  make  a  change  of  scale  of  the  form 


cold 


*■  new 


e 


(6-11) 


Performing  this  change  of  scale  on  Eq .  6-8,  with  care  taken  in  accounting  for 
the  quadratic  variation  of  the  innovations,  we  obtain  the  following 


dpL(t)  =  [  Y2  “  (Y1  +  "Y2 )  PL(  t )  ]  d  t  +  [q2  62(c)  -  m  6l(t)]dt 


g(e)  -  g(e) 

+  tfL  "  h(t)]pL(t)  dw(  t )  +  -  A6i(t)  dw(  t ) 

/  e  /  e 


(6-12a) 


d6]_(t) 


A1 

- 6i(t)dt  +  [ n 5  -  n 3  6i(t)  +  (04  -  ns)pL(t)  +  n6  62(t)]dt 

(6-1 2b) 

g(e) 

+  -  [Cl  pL(t)  +  C2  <5 1 C t )  -  h(t)  6i(t)]dw(t) 

/  e 


d62(t) 


a2  . 

-  —  6 2 ( t )dt  +  [ng  -  h4  62(f)  +  (nio  ~  n8)pL(t)  +  n9  6^ ( t ) ] d t 

( 6-1 2c ) 

g(e) 

+  ■  [C3  “  C3  PL(t)  +  C4  62(f)  -  h(t)  62(t)]dw(t) 

/  e 


where  dw(t)  is  a  standard  Brownian  motion  representing  a  scaled  version  of 
the  innovations. 

We  now  turn  our  attention  to  the  description  of  two  different  approxima¬ 
tions  of  the  optimal  estimator.  Both  are  based  on  the  fact  that  at  the  slow 
time  scale  the  left-right  transition  behavior  is  approximately  that  of  a 
2-state  Markov  process  [ 2  9  j  .  Specifically,  consider  the  time-scaled  4-state 
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process  p(t/e).  Over  any  interval  At,  this  process  undergoes  numerous  (indeed 
infinitely  many  as  e  +  0)  top-to-bot tom  transitions.  This  suggests  two  impor¬ 
tant  approximations.  The  first  is  that  if  we  aggregate  away  these  very  fast 
transitions,  we  will  be  left  with  a  two-state  Markov  process.  Specifically, 
let 

(  L  if  p(t/e)  =  1  or  2 

q(t)  =  (6-13) 

(  R  if  p(t/e)  =  3  or  4  . 

Then,  as  shown  in  [29],  q(t)  is  asymptotically  a  2-state  Markov  process  with 
transition  behavior  depicted  in  Fig.  6-2.  Here  the  rate  yi  represents  a 
weighted  average  of  the  lef t-to-right  rates  and  ^3  in  Fig.  6-1,  where  the 
weights  equal  the  ergodic  probabilities  of  being  in  states  1  and  2  when  we 
neglect  lef t-to-right  transitions  (i.e.,  the  fast  top-to-bottom  transitions 
essentially  reach  equilibrium  before  a  transition  from  left  to  right  occurs  so 
that  the  rate  of  such  a  transition  can  be  computed  by  this  weighted  average). 
Obviously,  there  is  an  analogous  interpretation  for  Y2' 


Vi 
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Figure  6-2.  The  2-State  Aggregate  Process 

The  second  approximation  is  based  on  the  fact  that  over  time  interval  At, 
the  fast  top-to-bottom  transitions  have  a  similar  averaging  effect  on  the 
observations,  i.e.,  that  we  can  model  our  observations  as 

dy(t)  =  g(t)  f ( q  (  t  )  j  d  t  +  dv(t)  (6-14) 
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where 


f(L)  =  fL  ,  f(R)  =  fR 


(6-15) 


again  represent  averages. 


The  resulting  nonlinear  filtering  equation  in  this  case  is  given  by 


dpL(t)  =  £ [ Y2  “  (Y1  +  Y2)PL(t)ldt  +  8(e)[fL  ~  h( t ) ] pL( t ) [dy ( t )  -  g(e)h(t)dt] 


whe  re 


h(t)  =  fR  +  (fL  -  fR)pL(t) 


(6-16) 


(6-17) 


Comparing  Eqs.  6-16  and  6-17  to  Eqs.  6-12  and  6-9,  we  see  some  important  sim¬ 
ilarities.  Indeed,  if  6i  and  62  were  replaced  by  zero  in  Eqs.  6-9  and  6-12, 
the  equations  in  fact  become  identical.  One  might  conjecture  then  that  under 
the  right  circumstances,  Eqs.  6-16  and  6-17  would  be  an  excellent  approxima¬ 
tion  to  the  optimal  estimator  of  the  slow  variables.  The  analysis  and  simu¬ 
lation  results  in  the  following  sections  show  that  for  measurements  with  a 
particular  range  of  signal-to-noise  ratios,  as  characterized  by  g( e) ,  this 
is  in  fact  the  case. 

It  is  instructive  to  examine  the  architectural  implications  of  the 
approximation  we  have  just  described.  In  Fig.  6-3  we  have  depicted  the 
architecture  of  the  optimal  estimator,  in  which  there  are  both  fast  and  slow 
variables,  while  in  Fig.  6-4  we  have  the  much  simple  (pictorially  and  computa¬ 
tionally)  approximate  estimator.  In  this  estimator  we  are  in  essence  taking 
advantage  of  the  slow  p^  dynamics  to  average  out  the  effects  of  the  fast  fluc¬ 
tuations  in  the  data.  In  fact  this  also  suggests  a  related  approximation. 

In  particular,  since  the  approximate  estimator  has  no  fast  variables,  its 
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Digital  Back-End  (Low  Sampling  Rate)  | 
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Figure  6-5.  A  Front-End/Back-End  Structure  Based  on  the  Approximate 
Filter  in  Fig.  6-4 


It  is  also  possible  to  describe  a  considerably  different  front-end/ 
back-end  structure  for  an  approximate  estimator.  This  structure  is  motivated 
explicitly  by  passive  acoustic  tracking  architectures  in  which  the  front  end 
uses  a  batch  of  data  to  estimate  slowly-varying  quantities  —  bearing,  fre¬ 
quency  —  based  on  the  assumption  that  these  quantities  are  constant  over  the 
time  interval  of  observation.  The  sequence  of  estimates  produced  by  the  front 
end  then  are  used  as  measurements  by  the  back  end  which  attempts  to  track 
these  slowly  varying  quantities. 

In  our  present  context,  a  structure  of  this  type  would  have  the  form 
depicted  in  Fig.  6-6.  Here  the  front-end  uses  a  batch  of  data  for 
kT(e)  <  t  <  (k+l)T(e)  to  perform  an  hypothesis  test  —  is  the  processor  on 
the  right  or  the  left  —  based  on  the  assumption  that  no  lef t-to-right  transi¬ 
tions  can  occur,  i.e.,  on  the  model  depicted  in  Fig.  6-7.  The  sufficient  sta¬ 
tistic  for  this  hypothesis  test  can  be  taken  to  be  either  P^,  the  conditional 
probability  that  the  process  in  Fig.  2-7  is  on  the  left,  or  the  likelihood 
ratio 
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Figure  6-6.  Front-End/ Back-End  Structure  with  Front-End  Hypothesis  Testing 


R-5564 

Figure  6-7.  Model  on  Which  the  Front  End  of  Fig.  6-6  is  Based 


PL 

L  =  -  .  (6-18) 

1  -  ?L 

The  computation  of  Pl  is  a  nonlinear  filtering  problem  analogous  to  the  one 
described  previously  —  i.e.,  by  getting  the  pj  =  0  in  Eq.  6-5,  letting 

PL  =  (PI  +  P2)  >  PLT  =  PI  »  PRT  =  P3  (6-19) 

and  replacing  P2  by  Pl  -  Plt  and  P4  by  1  -  Pl  ~  Prt»  we  obtain 

dPL(t)  =  g(e)[hL(t)  -  h(t))PL(t)[dy(t)  -  g(e)  h(t)dt]  (6-20a) 

dPLT(t)  =  [-Xi  Plt(c)  +  X2(Pl(0  '  PLT(t>)]dt 

+  g(e)[a  -  h(t)]PLx(t)[dy(t)  -  g(e)  h(t)dt] 
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dpR(t)  ~  l~^3  PRT(C)  +  *4(1  “  “  Prt( *- ) ) ] 

+  g(e)[a  -  h(t)]PRT(t)[dg(t)  -  g(e)  h(t)dt] 


(6-20c) 


where 


h(t)  =  6  +  (a  -  3 ) ( PLT( t )  +  PRT(t)) 


(6-21a) 


«  Plt^) 

hL(t)  =  B  +  (a  -  3)  -  (6-21b) 

PL(t) 

A 

Here  h^Ct)  is  Che  expected  value  of  h(p(t))  given  the  data  and  assuming  that 
the  process  is  on  the  left. 

Several  alternate  forms  of  these  equations  are  also  of  potential  value 
both  in  providing  insight  and  for  their  potential  computational  simplicity. 
First  we  define  the  conditional  probability  of  being  in  state  1  and  state  3 
given  the  data  and  assuming  that  the  process  is  on  the  left  and  right, 
respect ively : 

pLT  pRT 

QlT  =  -  Qrt  =  -  •  (6-22) 

Pl  1  -  pl 


An  application  of  its  differential  rule  [8]  then  yields 


dPL(t)  =  g(e)[hL(t)  -  h(t)]PL(t)[dy(t)  -  g(e)  h(t)dt]  (6-23a) 

dQLT( c )  =  [^2  “  (*1  +  ^2)QLT(t) ldt 

+  8(e)  (a  -  hL(t))QLT(t)  (dy( t )  -  g(e)  hL(t)dt) 


dQRT(c)  =  [ *4  “  (*3  +  ^4)Qrt( ^ ) ]dt 

+  g(e)  (a  -  hR(t))QRX(t)  (dy ( t )  -  g(e)  hR(t)dt) 


(6-23c ) 


where 
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dfc(t)  =  -  —  g2( e) [hL( t )  -  hR(t)]2dt  +  g(e)(hL(t)  -  hR(t)]dvR(t)  (6-28) 


which  is  just  a  simple  accumulation: 


(k+l)T(e) 

£((k+l)T(e))  =  £(kT(e»  -  -j-  g2(e)  /  [hL(t)  -hR(t)]2dt 

kT(e) 


(k+l)T(e) 

+  8(e)  /  lhL(t)  -  hR(t)]dvR(t) 

kT(e) 


(6-29) 


Let  us  comment  on  the  choice  of  initial  conditions  at  the  start  of  each 
interval.  A  natural  set  of  choices  are 


Qlt  = 


Al  +  X2 


A  3  +  A4 


( 6-10a ) 


ard  the  following  equivalent  conditions 


,  L  =  1  ,  l  =  0 


(6-30b) 


Equation  6-30a  corresponds  to  assuming  that  the  fast  process  is  in  equilibrium 
when  we  begin  an  observation  interval.  Equation  6-30b  corresponds  to  no  prior 
information,  i.e.  ,  no  feedback  is  provided  from  the  back  end  in  Fig.  6-6  to 
front  end. 

Finally,  at  the  end  of  each  interval,  we  perform  a  threshold  test.  In 


terras  of  l ,  this  is 
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t 


r 

‘  ^ 


D 


mk+l  =  L 

>  (6-31) 

d[(k+l)T(e)]  0 

< 

mk+l  +  r 

where  "L"  and  "R"  are  the  two  values  of  the  observational  input  to  the  back 
end  in  Fig.  6-6.  The  back  end  is  then  simply  a  sampled  data  estimator  of  the 
2-state  process  in  Fig.  6-2,  with  measurements  provided  every  T( e)  time  units. 
We  can  characterized  the  performance  of  the  front  end  in  terms  of  2  quantities 

(fiLi^e)  =  Prob[m^  =  L  |  process  on  the  left]  (6-32a) 

<(>LR(e)  =  Prob[m^  =  L  j  process  on  the  right]  (6-32b) 

We  explicitly  write  these  as  functions  of  e  to  indicate  that  the  performance 
of  the  test  (Eq.  6-31)  depends  upon  the  size  of  the  time  interval.  Note, 
however,  that  these  performance  measures  only  make  sense  fcr  T(e)  small  with 
respect  to  1/e  so  that  the  process  actually  is  very  likely  to  stay  on  the  left 
or  on  the  right  over  the  entire  interval. 

Given  the  quantities  in  Eq.  6-32,  the  optimal  back  end  can  be  described 
as  follows.  Let 

pL(k| j)  =  Prob[process  on  left  at  time  kT(e)  |  ms  ,  s  <  j ]  (6-33) 


then 
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4>LL(e)PL(k+1  lk) 

pL(k+l|k+l)  - -  (6-35a) 

<fLL(e)PL(k+1  lk)  +  <(iLR(e)  I1  "  PL(k+1lk)J 

while  if  m^+i  =  R 

[1  “  <)>LL(e)]PL(k+1  lk) 

pL(k+l |k+l)  =  — — - (6-35b) 

t1  "  4>LlC  e> )  PLCk-*~l  | k)  +  [1  ~  ^LR(e)][1  ~  PL(k+1lk)] 

The  evaluation  of  4>ll(£)  and  4>LR(e)  are  therefore  essential  both  to  define  the 
update  step  (Eq.  6-35)  and  to  evaluate  its  overall  performance  and  asymptotic 
properties.  A  key  question  here  involves  the  relationships  among  the  time 
scale  separation  controlled  by  e,  the  rate  at  which  information  is  obtained 
controlled  by  g(e),  and  the  data  collection  interval  T(e). 

We  note  that  it  is  also  possible  to  define  a  slightly  different  front- 
end/back-end  structure  in  which  we  eliminate  the  hard  decision  (Eq.  6-31)  in 
the  front  end  and  instead  take  as  input  to  the  back  end  the  likelihood  ratio 

Lk  =  L(kT(e))  .  (6-36) 

Note  that  for  T(e)  small  with  respect  to  1/e  and  with  an  initial  condition  of 
l  on  L  at  the  start  of  an  interval,  Lk  is  approximately  equal  to 

Pr  [data  over  [(k-l)T(e) ,kT(e))  |  process  on  the  left] 

Pr  [data  over  [(k-l)T(e) ,kT(e))  |  process  on  the  right] 

for  the  true  process.  This  suggests  the  following  back  end  algorithm.  The 
prediction  step  is  still  given  by  Eq.  6-34.  The  update  step,  however,  is 
given  by 

kk+l  PL(k+1lk) 

PL(k+l|k+l)  =  -  •  (6-37) 

Lk+1  PL(k+1-ik)  +  U-  -  P L C k+ 1  |  k )  ) 
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Finally,  we  note  that  we  can  combine  our  two  types  of  approximations  to 
obtain  a  far  simpler  one.  Specif ically ,  suppose  we  model  our  observations  as 
in  Eq.  6-14  by  assuming  an  averaging  effect.  Writing  these  in  the  original 
time  scale 


dy(t)  =  g(e )  f(q(t))dt  +  dv(t) 


Then  if  our  front  end  assumes  no  lef t-to-right  transitions,  we  obtain 


(6-38) 


dpL  =  g(e)[fL  -  h( t ) ]  PL(t)[dy(t)  -  g(e)  h(t)dt]  (6-39) 


h(t)=  fR  +  (f^  -  fR)  Pl( c ) 


(6-40) 


If  we  define 


L(t)  = 


pl(c) 
i  -  pl(c) 


(6-41) 


we  obtain 


or  if 


dL(t)  =  g(e)  (fL  -  fR)  L(t)[dy(t)  -  g(e)fRdt] 


fc(t)  =  log  L(t) 


(6-42) 


(6-43) 


d£(t)  =  — -  g2(e)  [fL  -  f R] dt 2  +  g(e)  (fL  “  fR)  [dy(t)  -  g(e)fRdt] 


i.e.  , 


(6-44) 


d  5.  ( t )  =  -i—  g2(e)  [fR2  -  f^2]  +  gU)  (fL  ~  fR)dy(t)  .  (6-45) 
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This  leads  to  the  following  algorithm:  use  Eq.  6-34  for  the  prediction 
step  and  Eq.  6-37  for  the  measurement  update,  with 


Lk  =  e 


(6-46) 


( k+1 )T( e) 

*k  =  —  g2(e)  [fR2  -  fL2lT(e)  +  g(e)  [fL  -  fR]  /  dy(x)  . 

2  kT(e) 


(6-47) 


Note  that  the  basic  processing  step  in  the  front  end  is  exactly  a  simple  aver 
aging  of  the  measurements. 


6.2.2  Perfect  Measurements  with  Small  Differences  in  Rates 

In  this  case,  since  we  directly  observe  whether  the  process  is  on  the 
top  or  the  bottom,  the  exact  filtering  equations  involve  only  PL(t),  the  prob¬ 
ability  of  being  on  the  left.  To  facilitate  our  development,  we  introduce  a 
process  indicating  the  top-bottom  status  of  the  processor 


x(t)  = 


1  ,  p(t)  =  1  or  3 
0  ,  p(t)  =  2  or  4 


(6-48) 


Then,  using  the  analysis  in  Appendix  B  we  have  the  following  description  of 
the  evolution  of  pk(t): 


While  x(t)  =  1 

pk(t)  =  e u 2  “  e(ui  +  U2)  PlU)  +  (*3  “  *].)  “  Pl/t))  (6-49a) 


While  x(t)  =  0 

pL(t)  =  eu4  -  e(p3  +  P4)  pk(t)  +  (*4  “  ^2)  PL^H1  “  PlU)]  (6-49b) 


.%  wV  JV  JV /.  l.“.  *.  *-  *-  *r  N  *.  *,  J 
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1- 


When  x(t-)  =  1  and  x(t)  =  0 

PL(t)  = 


*1  PL(t-> 


(Xi  -  X2>  p^(t-)  +  X3 


(6-49c) 


When  x(t-)  =  0  and  x(t)  =  1 


pl(c)  = 


^2  PL<t-) 


(X2  ~  X4)  pL(t-)  +  X4 


(6-49d) 


These  can  be  combined  into  the  following  stochastic  differential  equation 
where  we  also  use  Eq.  6-3: 


dPL(c)  =  [eu2  “  eC m  +  b2>  Pl(c)  =  <*g(e)  PlCO  (i  ~  Pl( t ) ) Jx( t )dt 


+  [eP4  ~  e ( u 3  +  P4)  PLCt)  +  6g(e)  Pl(0  Q  ~  Pl( t ) ) ]  [  1  -  x(t)]dt 


+  g(e)PLCt)(l-Plj(t)] 


ax(t) 


B[l-x(t)] 


Xi+ag(e)[l-pL(t)j  X2+Bg(£)[l-PL(t)] 


dx(  t ) 


(6-50) 


Two  alternate  forms  of  these  equations  provide  additional  insight.  First 
note  that 


E[dx(t)jx(f),  TCt]  =  ~x( t ) [ X 1  pL(t)  +  X3U  -  pL(t))]dt 


+  [1  -  x(t)j  [X2  PL(t)  +  X4 ( 1  -  pL(t)))dt 


=  -x(t)[Xi  +  ag( e)  (1  -  pl( t ) ) ] 


(6-51) 


+  [1  ~  x(t)]  [ X2  +  Bg( c)  (1  -  PL(t))] 


A  h(t)dt 


H3 
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Using  F.qs.  6-3  and  6-51  we  can  rewrite  Eq.  6-50  as 


dpL(t)  =  e  l  U2  -  (pi  +  n2)PL(t)]x(t)dt  +  elPA  "  (u3  +  UA)Pl(c)H1  "  x(t)]dt 


+  g(e)pL(t)[ l-pL(t)J 


ax(  t ) 

X  q+ag( c ) [ l-pL(t)] 


B[ l-x(t)  ] 

X2+&g(e)[l+PL(c)] 


(dx(t)-h(t)dt) 


(6-52) 


A  second  important  form  involves  two  derived  quantities 


R(t)  =  Total  residence  time  of  p(i)  0<T<t  on  the  top 

(i.e.,  in  states  1  or  3)  .  (6-53) 

K(t)  =  Total  number  of  changes  in  x(t)  over  0<x<t  . 

Then 


dR( t )  =  x(t)dt 

dK( t )  =  ! d  x ( t ) |  =  -x(t)dx(t)  +  [ l-x( t ) ]dx( t )  =  [ l-2x( t ) ] dx( t ) 


(6-5A) 


Note  several  equalities 


x2(t)  =  x  ( t )  ,  [l-x(t)]2  =  l-x(t)  ,  x(t)(l-x(t)]  =  0 

[ l-2x( t ) ] 2  =  l  ,  x( t ) [ l-2x( t ) ]  =  -x(t)  ,  [ l-x( t ) ] [ l-2x( t ) ]  =  l-x(t) 

(6-55) 

Using  Eqs.  6-50,  6-5A,  and  6-55  we  obtain 


dpL(t)  =  [eu4  -  e(u3  +  UA^PlX1)  +  Bg(e)pL(t)  (1  “  PL(t))ldt 
+  [ e ( pi 2  “  ua)  -  e(ui  +  u2  "  M3  “  PA)  PL(t) 

+  (a-B)  g(e)  pL(t)  (1  -  PL(t))]dR(t) 

"  g(c)pL(t)[l-pL(t)] 

(6-56) 


ax(t ) 


B[  l-x( t ) ] 


Xi+«g( e) [ 1-Pl( t ) ]  X2+Bg(e)[l-PL(t)] 


dK(t) 
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Equation  6-56  suggests  one  straightforward  front-end/back-end  decomposi¬ 
tion.  Specifically,  because  of  the  slow  dynamics  in  Eq.  2-56  one  can  imagine 
a  system  of  the  form  depicted  in  Fig.  6—9.  Here  the  front  end  performs  a 
simple  accumulation,  computing  AR  and  AK  over  an  interval  of  time  that  is  long 
with  respect  to  the  fast  dynamics  and  short  with  respect  to  the  slow  dynamics. 
These  quantities  are  then  fed  into  a  sampled  version  of  Eq.  6-56. 

One  can  also  envision  two  other,  more  sophisticated  front-end/back-end 
approximations  to  Eq.  6-56,  each  of  which  is  analogous  to  one  of  the  forms 
described  in  the  preceding  subsection.  In  the  first  of  these,  depicted  in 
Fig.  6-10,  we  perform  a  front-end  hypothesis  test  whose  results  are  fed  into 
a  back-end  slow  estimator.  Comparing  Figs.  6-6  and  6-10,  we  see  that  there  is 
a  difference  in  that  the  slow  estimator  uses  both  the  hypothesis  test  results 
and  the  raw  data  x(t).  To  understand  this,  it  is  important  to  realize  that 
x(t)  provides  us  with  two  types  of  information:  the  indirect  information 
about  whether  p(t)  is  on  the  left  or  right  that  is  embedded  in  the  switching 
behavior  of  x( t )  and  the  di rect  information  concerning  which  lef t-to-right 
rates  --  and  u2  or  U3  and  p4  —  are  in  effect.  Indeed,  consider  the  two- 
state  process  depicted  in  Fig.  6-11.  The  evolution  of  the  unconditional  prob¬ 
ability  of  being  on  the  left  for  this  process  is  given  by 

dpL(t)  =  -  [cmx(t)  +  ep3(l-x(t))]pL(t)  +  [  tP2x( c )  +  eu4<  1— x(  t ) )  ]  [  1— pL(  t )  ]  dt 
=  [eu4  -  t ( U3+M4) Pl( t ) ] dt  +  [e(u2  “  ~  e( p]  +  U2- U3-P4)Pl( 1 ) 1 dR( c ) 

(6-57) 

Compare  this  to  Eq.  6-56.  Note  that  the  remaining  terms  in  Eq.  6-56  —  i.e., 
those  not  captured  in  F,q.  6-57,  yield  the  equation  for  the  conditional  proba¬ 
bility  of  being  on  the  left  given  the  data  and  assuming  no  lef t-to-righ t 
transitions  occur: 
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Figure  6-9.  Front-End/Back-End  Structure  Arising  from  Slow  Integration 
of  Optimal  Estimator 


Figure  6-10.  Hypothesis-Testing  Front  End  with  Slow  Back-End  Estimator 


^x(t)  +n3(1-x(t)) 


H2x(t)  +  ^i4(1  -x(t)) 

R  5503 


Figure  6-11.  The  Left-Right  Transition  Rati" 
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dpL(t)  =  8g(e)PL<t)[1~PL(t)]dt  +  (a-0)g(e)pL(t)[l-pL(t)]dR(t) 


ax(t)  g[l-x(t)] 

~  g(e)PL(t)[1-PL(t>  -  +  -  dK(t) 

Al+ag( e) [ 1~pl( t ) ]  A2+8g( e) [ 1“Pl( t ) ] 


(6-58) 


Equation  6-58  can  in  fact  be  solved  in  closed  form,  as  the  sufficient 
statistics  in  this  case  are  essentially  R(t)  and  K(t).  Specifically,  let 


D(t)  =  Number  of  top-to-bottom  transitions  up  to  time  t 
U(t)  =  Number  of  bottom-to-top  transitions  up  to  time  t 


(6-59) 


Note  that 


K(t)  *  D(t)  +  U(t) 


(6-60) 


and  if  x(0)  =  1 


■PM  ■  •»>-LM 


(6-6la) 


while  if  x(0)  =  0 


U(t)  = 


(6-6lb) 


where 


Define 


f  x  "1  =  smallest  integer  > 


|_  x  J  =  largest  integer  < 


(6-62) 


L(t)  =  e"6g(Ot  e(B-a)g(e)R(t)  ^  +  a§(£  jP(° 


(6-63) 
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(6-64) 

In  this  case,  it 

(6-65) 

It  is  a  straightforward  exercise  to  verify  that  Eq.  6-55  satisfies  Eq.  6-58. 

We  can  now  describe  precisely  the  front-end/back-end  structure  depicted 
in  Fig.  6-10.  Again,  let  (k-1)  T(e)  <  t  <  kT(e)  denote  the  k-th  batch  of  data 
for  the  front-end  processor,  and  let 


Rk  =  R(kT(e)) 

-  R[(k-l)T(e)J 

(6-66a) 

Dk  =  D(kT( c) ) 

-  D[(k-l)T(e)] 

(6-66b) 

Uk  =  U(kT(e)) 

-  U[ (k-l)T( e) ] 

(6-66c) 

Kk  =  K(kT(e» 

-  K[ (k-l)T( e) ] 

(6-66d) 

Lk  =  likelihood  ratio  based  on 
the  k-th  batch  of  data 

(6-66e) 

Then  L(t)  is  the  likelihood  ratio 

Pr(x(t),  t<t  1  p(t)  on  the  right) 

L(t)  =  - 

Pr(x(r),  T<t  |  p(t)  on  the  left) 

assuming  that  no  lef t-to-right  transitions  can  take  place, 
is  easy  to  see  that 

pl(0) 

p  ( t )  - -  . 

PL<0>  +  [l-PL(0)]L(t) 
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The  front  end  then  produces  an  output  based  on  the  decision  rule 

rak  ■  R 

> 

Lk  1  (6-68) 

< 

mk  =  L 

Also  let 


PL(k|j)  =  Ptob[process  on  left  at  time  kT(e)  |ms>  s<j]  .  (6-69) 


The  prediction  step  of  the  algorithm  consists  of  integrating 

PL  =  “  [eyix(t)  +  ep3(l-x(t))]pL 

+  e [ P2X( c )+e ( l~x( t ) ) ] ( 1~pl) 


(6-70) 


from  t  =  kT(e)  to  t  =  (k+l)T(e)  using  as  initial  condition  pL(k|k).  The 
update  step  is  then 

$LL(e>  pL(k+l|k) 

pL(k+l|k+l)  -  (6-71a) 

4-LL(e)  PL<k+l|k)  +  ♦LRll-PL(k+l|k)] 


if  (%  =  L  ,  and 

[  e) J  pL(k+l|k) 

pL(k+l|k+l)  - -  (6-71b) 

[l-<fLL(e)]pL(k+l|k)  +  [l-*LR(e)][l-PL(k+l|k)) 


if  %  =  R,  where  <J>ll  and  4>lr  are  defined  in  Eq.  6-32.  We  note  that  in  this 
case  it  is  possible  to  carry  out  some  of  the  analysis  associated  with  these 
quantities,  and  this  is  done  in  Appendix  C. 

Note  that  in  general  the  exact  computation  of  the  prediction  equation 
(Eq.  6-70)  involves  the  complete  x(t)  sample  path  —  essentially  we  are 
switching  between  two  sets  of  rates,  i.e., 

89 


ALPHATECH,  INC 


PL(t) 

=  e[Ai  x(t)  +  A2[l-x(t)]] 

PL(t) 

. 

pr(c) 

PR(t) 

where 


“PI 

P2 

-p3 

P4 

Ai  = 

ii 

< 

PI 

“P2 

P3 

-P4 

Pl/P2  =  P3/P4 


(6-72) 


(6-73) 


(6-74) 


i.e.,  if  the  steady-state  probabilities  associated  with  A^  and  A2  are  the 
same,  then  A^  and  A2  commute  and  the  solution  of  Eq.  6-72  involves  only  the 
integral  of  x(t),  namely  R(t).  If  Eq.  6-74  does  not  hold,  this  isn't  true 
exactly,  although  for  eT(e)  small,  the  difference  between  the  solution  to 
Eq.  6-72  at  t  -  (k+l)T(e)  from  initial  condition  PL(k|k)  at  t  »  kT(e)  and  the 
approximation 


[1,0] 


expfeAi  Rk+1  +  eA2[T(e)  -  Rfc+i ] } 


PL(klk) 

l-pL(klk> 


(6-75) 


is  0(e^  T^(e)).  If  this  is  ignored,  we  can  use  Eq.  6-75  which  in  expanded 
form  is  as  follows: 


PL(*+l|k)  '  e''1*1  T<£>  +  t2  Rk+llPL(k|k) 

(6-76) 

[»4  T(e)  +  (urii3)Rk+iHl  -  I<E>  +  *2  R^'] 

+  - - - 
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where 

4*1  =  U3  +  H4  ,  i>2  “  PI  +  M2  “  M3  “  U4  •  (6-77) 

Note  in  this  case  that  only  Rk+i ,  and  not  the  entire  x(t)  sample  path,  is  used 

by  the  slow  estimator. 

Finally,  as  in  the  preceding  subsection,  we  can  replace  the  hard  decision 
rule  (Eq.  6-68)  by  a  Bayesian  update.  In  this  case,  the  front  end  again  cal¬ 
culates  Rfc,  D^,  Uk  and  Lk  as  in  Eq.  6-67.  The  back  end  then  performs  a  pre¬ 
diction  step  using  Eq.  6-70  or  the  simpler  (and  exact  if  Eq.  6-74  holds  and 

otherwise  approximate)  version  (Eq.  6-76)  and  a  measurement  update  step  of  the 
following  form: 

PL(k+l|k) 

pL(k+l|k+l)  -  .  (6-78) 

PL(k+l|k)  +  [1  -  PL(k+l|k)]Lk+1 
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SECTION  7 

ASYMPTOTIC  ANALYSIS 

In  this  section  we  present  some  results  on  the  asymptotic  analysis  of 
the  white  noise  filtering  problem  described  in  subsections  6.1.1  and  6.2.1. 

To  facilitate  the  analysis,  we  repeat  the  key  equations  here  in  somewhat  more 
compact  form.  In  Eqs.  6-8  and  6-9  we  presented  the  exact  optimal  filtering 
equations  in  a  particular  set  of  coordinate,  while  in  Eqs.  6-16  and  6-17  we 
presented  the  corresponding  equation  for  a  particular  aggregate  approximation. 


W-  , 
- 


e 


$ 


§ 


£ 


‘a 


£ 


By  eliminating  dy(t)  in  Eq.  6-16  through  the  use  of  Eq.  6-9a,  we  obtain  the 
following: 


dpL(t)  =  eA[pL(t),6i(t),62(t))dt 

+  g(e)  B(pL(t),<$i(t),62(t))dv(t) 

dpL(t)  =  eA(pL(t) ,0,0)dt  +  g(e)  B(pL(t),0,0)dv(t) 

+  g2U)  C(pL(t)pL(t),6i(t),62(t))dt 

d 6 ! ( t )  =  -Ai  «!(t)dt  +  eD(pL(t),61(t),62(t))dt 
+  g(e)  F(pL(t),61(t),62(r.))dv(t) 

d62(t)  =  -A2  52(t)dt  +  eG(pL(t),6i(t),62(t))dt 
+  g(e)  J ( PL( t),61(t),62(t)]dv(t)  . 

where  dv(t)  in  Eq.  6-9a  is  a  standard  Brownian  notion  and  where 


92 


(7-la) 


(7-lb) 


(7-lc) 


( 7— Id ) 
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A(pL,6i,62)  =  Y2  “  (Y1  +  Y2)PL  ~  ni  61  +  h2  <$2  (7-2a) 
B(pL.61,62)  =  [fL  ~  H(pL,61,62)]pL  +  A6X  (7-2b) 
H(PL,6!,62)  =  fR  +  (fL  -  fR)pL  +  A[6i  +  62]  (7-2c) 
C(pL,PL, «l,«2)  =  l*L  ~  H(pL,0,0)]pL  [H(Pl,0,0)  -  H(pMl,62>]  (7-2d) 
D(pl,6i,62)  =  T15  -  n3  <Sl  +  (h4  ~  n5)pL  +  %  62  (7-2e) 
F(PL>$1»62)  =  5l  PL  +  ?2  51  ~  H(pL,6i,62)6i  (7-2f) 
G(pl,6i,62)  =  n8  -  n4  52  +  Cnio  “  n8>PL  +  P9  61  ( 7— 2g ) 
•J(PL. 6li52)  =  C3  “  53  PL  +  $4  62  “  H(pL,6i,62)62  (7-2h) 


Note  that  the  aggregate  approximate  probability  pl  is  now  coupled  to  the  exact 
quantities,  as  we  want  to  write  all  equations  in  a  form  driven  by  the  true 
innovations . 

What  we  show  in  this  section  is  that  the  approximate  filter  (Eqs.  6-16 
and  6-17)  is  a  good  one,  in  that 

q(t)  =  pL(t)  -  pL(t)  (7-3) 

is  small  compared  to  the  amount  of  information  contained  in  or  p^.  More 
precisely,  let 

Y2 

u(t)  =  pL(t) - -  pL(t)  -  .  (7-4) 

Y1  +  Y2 

Then  y(t)  is  precisely  the  deviation  of  p^(t)  from  its  steady-state  value  i^f 
no  measurements  were  available .  We  essentially  show  that  q(t)  is  quite  small 
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in  relation  to  p(t).  In  preparation  for  a  precise  statement  of  this  type, 
we  transform  variables  in  Eq.  7-1  using  Eqs.  7-3  and  7-4  and  also  change  to 
the  slow  time  scale.  The  result  is  the  following  set  of  equations  on  which 
our  analysis  will  focus: 


dp(t)  =  -  r  n(t)dt  +  K(d1(t),62(t))dt  +  -  N(U(t),61(t),62(t))dw(t) 

/  e 


(7-5a) 


Ai  8(e) 

dfi^t)  - 61(t)dt  +  P(y(t),61(t),62(t)dt)  +  -  Q(ii(t),61(t),62(t))dw(t) 

e  /  e 

(7-5b) 

A2  8(e) 

d62( t)  =  -  62(t)dt  +  R(p(t),61(t),62(t)dt)  +  -  S(y(t),61(t),62(t) )dw(t) 

e  /  e 


(7-5c) 


dq(t)  =  -  r 


q(t)dt  +  K(61(t),62(t))dt  +  M(q(t),61(t),62(t))dt 


+  U(q(t),y(t),61(t),«2(t))dw(t) 

/  e 


where  w(t)  is  a  standard  Brownian  motion,  and 


(7-5d) 


r 

=  T1  +  Y2 

(7-6a) 

K(6i,62) 

=  -  ni  <S i  +  q2  52 

(7-6b) 

M(q,6i,62) 

=  (fL  _  fR)<l  +  A(fil  +  62) 

(7-6c) 

N(u,6i  ,62) 

=  [(fL  “  fR)(l-^_y)  -  A(Si  +  «2> ]  [  +  A6i 

( 7— 6d ) 

P(u  ,<$2) 

=  r\5  ~  H3  +  ( T14  -  115) ( u+'J')  +  %  62 

( 7-6e ) 
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Q(u, 61,62)  =  5l(u-hf)  +  €2  Si 


“  [  f  R  +  (^L  ~  ^rHu+'I')  +  S(6i  +  62)  1  61 


( 7-6f ) 


R(u,Si»S2)  =  ns  ~  n7  62  +  C nio  "  nsXu+'f’)  +  ng  Si 


(7-6g) 


S( p ,61 ,62)  =  £3  -  £3(irh|>)  +  £4  62 


-  [fR  +  (^l  ”  fR)(y+4<)  +  S(6i  +  1  52 


( 7— 6h ) 


U(q,u,6l,<$2)  =  (fL  -  fR)q[l-2(p+4»)  +  q)  -  A(6i  +  62) C +  A61  (7-6i) 


Theorem:  Suppose  Y1  Y2  *  0  and  *  £r*  Further  suppose  that 


q(0)  =  0  (i.e.,  pL(0)  =  pL(0),  and  define 


if  Si2(0)  +  622(0)  =  0 


to(£)  _ 


1  g2(e^ 

-  An  - 

r  e 


otherwise 


(7-7) 


Suppose  that  g(e)  =  oCe1/2)  and  e  =  0(g(e)),  i.e.. 


lim  ~  =  0  ,  lim  — —  exists 

e4-0  e1/2  e+0  g(e> 


(7-8) 


Then  for  e  sufficiently  small,  there  exists  a  positive 
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Let  us  make  several  comments  about  this  result.  First,  the  assumption 
that  Yl  Y2  *  0  simply  means  that  the  2-state  aggregate  process  in  Fig.  6-2  has 
a  nontrivial  ergodic  distribution,  while  the  condition  fL  *  fR  states  that  the 
aggregated  version  of  the  measurements  does  contain  at  least  some  information. 
Also,  from  Eq.  7-7  we  see  that  the  need  for  tQ(e)  is  due  entirely  to  boundary- 
layer  effects  caused  by  initial  conditions  on  6^(t)  and  62(t).  Finally  note 
that  the  assumption  that  q(0)  =  0  is  reasonable,  as  it  certainly  makes  sense 
for  the  exact  and  approximate  filters  to  be  initialized  identically. 

What  this  theorem  states  is  that  for  measurement  quality  in  the  range 
specified  by  Eq.  7-8,  the  mean  square  deviation  is  order  e  in  size  when  com¬ 
pared  to  the  mean-square  information  content  as  measured  by  p(t).  The  inclu¬ 
sion  of  the  arbitrary  time  T  in  Eq.  7-9  in  fact  implies  that  this  information 
content  persists  at  a  level  far  above  that  of  the  deviation  q(t).  Note  also 
that  as  should  be  clear  from  the  proof,  this  result  can  be  extended  to  the 
case  of  poorer  measurement  quality,  i.e. ,  g(e)  =  o(e),  with  an  appropriate 
change  in  che  right-hand  side  of  Eq.  7-9.  The  order  however,  represents 

a  critical  cut-off  point  at  the  upper  limit  of  measurement  quality.  This  will 
also  be  seen  in  the  results  presented  in  the  next  section. 

Proof  of  Theorem:  For  notational  simplicity  in  what  follows,  we  will  use 
simplified  notation  of  the  type 

R(t)  =  R(u(t),  <$i(  t ) ,  62(  t ) )  (6-10) 

and  will  refer  to  the  specific  arguments  of  such  a  function  only  when  neces¬ 
sary.  Also,  let  us  note  an  extremely  important  fact:  the  processes  for  which 
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we  are  solving  in  Eq.  7-5  all  represent  differences  between  two  probabilities, 
i.e.,  we  know  a  priori  that 

|y(t)|,  |  6 ! ( t ) |  ,  |62(t)|  ,  | q( t ) |  <  1  .  (7-11) 

Consequently,  since  the  quantities  in  Eqs.  7-6b  -  7-6i  are  all  first-  or 
second-order  polynomials  in  these  processes,  we  see  that  they  are  all  bounded, 
i.e.  ,  there  exists  a  positive  number  K  <  <*>  so  that 

| K( t ) | ,  |M( t ) | ,  |N( t) | ,  |P(t)|,  |Q(t)|,  |R( t ) | ,  |  S ( t )  |  ,  |U( t ) |  <  K  . 

(7-12) 


Our  proof  now  proceeds  by  first  bounding  the  sizes  of  6^(t)  and  62(c)> 
then  q(t) ,  and  finally  u(t).  To  begin,  we  note  from  Eq.  7-5b  that 


6L(t)  =  e  Alt/e61(0)  +  /  e  Al(t  T)/eP(-r)dT  +  f  e"Al(t"T)Q(  r)dw(t)  . 

0  0 


Then,  using  the  fact  that 


M  M 

(  l  xi)2  <  M  l  xi2 

i=l  1=1 


(7-13) 


(7-14) 


we  obtain 


E[ «i2( t) ]  <  3  e"2Alt/eE[612(0) J 

+3  /  /  e-Al[(t_T)+(C"o)]/eE[P(T)P(o)]dTda  (7-15) 

0  0 

+  3  j  e-2 Al( t_T )/ ce[ Q2( t) ] d t  . 

0 
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Using  Eqs.  7-11  and  7-12  we  obtain 


E[612(t)]  <  3  e~2Alt/eE[512(0)]  +  3  K2  /  eAl(t  l)/e  dt 

L  o 

+  3  g2(e)K2  jte-2Ai(t-T)/edT  # 


(7-16) 


Then  using  the  fact  that 


{  e  a^C  T^dx  =  J  e  aTdx  <  /  e  aTdr  = 


(7-17) 


we  obtain  the  bound 


E[  6L2(  t)  ]  <  3  E[6i2(0)]e  2Alt/e  +  3-^2  +  3  ,k282.(j).  .  (7-18) 


In  an  analogous  fashion,  we  obtain  the  bound 


Et 622C t) ]  <  3  E[622(0)1e~2A2t/£  +  3—y~-  +  3  K2g2(-?^  .  (7-19) 


Consider  next  q(t).  From  Eq.  7-5d  and  the  fact  that  q(0)  =  0,  we  have 


q( t )  -  /  e  r<t  T,K(T)dt  +  SiLil  j  e  r(t  T)M(i)dT 


+  /t,-r<t-')uct)d»(T)  . 


Using  Eq.  6-14,  once  again  we  obtain 


(7-20) 


iV.TiTi4 
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E[q2(t)j  <3  /  /  e"rt(t_T)+(t:'o)1E[K(T)K(a)]dTdo 

0  0 

+  3  e4|--)  /  /  e"rt(t_T)+(t"0)1E[M(T)M(<7)]dTda  (7-21) 

£  0  0 

+  3  I  e-2r(t“T)E[U2(T)]dT  • 


Then  employing  Che  inequality 


|xy|  <  -  (x2  +  y2) 

2 


we  find 


+  3  Jte~2r(C'x)E[U2(T)]dT  . 

0 


Again  using  Eq.  7-17  we  obtain 


E[q2(t)]  <  ~~  /e  r(t“T)E[K2(t)]dT 


3  g4(e)  fC  -r(t-T) 


E[M2( x) ]dx 


+  -  slill  I  e-2r(t-T^E[U2(T' jdx  • 


(7-22) 


E[q2(t)]  <  3  /  /Ce  n(t  T)+(t"°)]E[K2(T)]dT 
0  0 

+  3  /  /  e_r[(t"T)+(,:"a)1E[M2(T)]dT  (7-23) 


(7-24) 


•i! 
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Let  us  now  examine  the  first  integral  on  the  right-hand  side  of  Eq.  7-24 
From  Eqs.  7-6b,  7-14,  7-18,  and  7-19  we  see  that 


E[K2(t)]  <  2  m2  E[6!2(t)]  +  2  n22  E[622(t)] 

<  AX  E[622(0)  +  622(0)]e"2  At/e 

+  A2  e  e  At^e  +  A3  e2  +  A4  g2(e) 


(7-25) 


where  A  =  min(A^,A2)  and  A^ ,  A2 ,  A3,  A4  are  appropriate  postive  constants. 
Also,  from  Eq.  7-8  for  e  sufficiently  small* 


E [ K2 ( t ) ]  <  At  E[612(0)  +  622(0)]e"2At/e 


(7-26) 


+  A2  e  e  At  e  +  A5  g2(e) 


for  same  constant  A5 , 


Using  Eq.  7-26  we  then  have  that 


J  e  r(t_T)E[K2(T)]dT 


<  fA1{E[612(0)]  +  E[<522(0)]  +  A2e)  -  [l 

2A-eT 

A5  g2(e)  _ 

+  — -  (1  -e  rt) 


(T-A/e)t> 

e  J 


(7-27) 


A^Elfi^CO)]  +  E[  <522(  0)  ]  }e 


+  A^  g2(  e) 


where  we  have  again  used  Eq.  7-8  and  are  assuming  e  sufficiently  small. 


*This  is  a  typical  place  where  we  use  the  fact  that  g( e)  is  no  smaller  than 
order  c.  If  g(t)  =  0(e),  the  last  term  in  Eq.  7-26  would  be  A3  e2 • 
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We  next  examine  the  second  integral  on  the  right-hand  side  of  Eq.  7-24. 
In  this  case  we  will  be  a  bit  more  careful.  Specifically,  from  Eqs.  7-6c  and 
7-14  we  have  that 


/  e  r(t"T)E[M2(T)]dT  <  3( fL  -  fR)2  /  e“r( t“T)E[q2( T) Jdx 
0  0 

+  3  A2  /  e_r(,:"T){E[612(T)  +  622(t)}dT 


(7-28) 


In  a  manner  analogous  to  the  calculations  in  Eqs.  7-25  -  7-27,  we  can  bound 
the  second  integral  on  the  right-hand  side  of  Eq.  7-30  in  exactly  the  same 
form  as  in  Eq.  7-27.  Thus  for  e  sufficiently  small 


(7-29) 


/  e  r(t'T)E[M2(x)]dr  <  3( f L  -  fR)2  /  e  r(t  T)E[q2(x)]dt 
0  0 

+  A7{E[512(0)]  +  E[622(0)]}e  e_rt 

+  Ag  g2(e)  . 


Examining  next  the  third  integral  on  the  right-hand  side  of  Eq.  7-24  and 
using  Eqs.  7-6i  and  7-14,  we  have 


/  e  2r(t"T)E[U2(t)]dT  <  3  /  e"2r(t  t)E[ dL2( x)q2( x) ]d T 


3  /  e  2?(t  T)E[d22(T)  6 ! 2 ( x)  +  d32( t) 622( T) ]dx 


where 


dt(  t )  =  (fL  -  fR)  [  1~2(  u(  t)  +  *)  +  q  (  t )  ] 


(7-30) 


(7-31a) 


d2(t)  =  A( 1  -  p(t)  -  ¥) 


( 7-31b) 


d3(t)  =  A(  p(  t  )  •+  i{i) 


(7-31c) 


Thanks  to  Eqs.  7-11  and  7-12,  d]_(t),  d2(t),  and  dj(t)  are  bounded  so  that  we 
can  obtain  a  simple  bound 


/  e  2r(C  T)E[U2(x)]dT  <  A9  /  e"2r(t  T)E[q2(x))dT 

0  0  (7-32) 

+  A10  /te'2r<t~T)E{612(T)  +  622(x)}dx 
0 


and  then  again  in  a  manner  analogous  to  Eqs.  7-25  -  7-27,  we  can  obtain  the 
following  bound  for  L  sufficiently  small 


/  e  2r(C  T)E[U2(t)]dt  <  A 9  /  e  2r( t_T)E[ q2( t) ]dt 
0  0 

+  A11{E[6i2(0) ]  +  E[ 622(0) ] }e  e' 
+  Ai2  g2(e)  . 


(7-33) 


Combining  Eqs.  7-24,  7-27,  7-29  and  7-33  (and  performing  a  few  addi¬ 
tional,  straightforward  bounds),  we  obtain  a  bound  of  the  following  form  for 


V 
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We  can  now  use  Eq.  7-34  to  obtain  a  first  bound-  Specifically,  let 


M  =  sup  E[q2(t)J  . 
t>0 


Then  using  Eqs.  7-17,  7-34,  and  7-35,  we  obtain 


A15  M  g4(c)  Aig  M  g2(e) 
M  i  A33  g2(e)  +  A34  E{6i2(0)  +  672(0)}  e  +  — - 2 —  +  - 


A3 3  g2(e)  +  A34  E{6x(0)  + 


A15  M  g  (e)  A16  M  g^(  e) 


for  e  sufficiently  small.  Using  this  bound  in  the  two  integrands  in  Eq.  7-34 
and  using  Eqs.  7-17  and  7-37,  we  see  that 


E[q2(t)]  <  A33  g2(e)  +  A34  E{6i2(0)  +  622(0)}e  e 


(7-38) 


A15  a17  84(£)  a16  a17  _ 

+ - +  -  g2(  e) 

r  e  2  r 

<  a18  g2(e)  +  A34  E{6i2(0)  +  622(0)}e 


where  the  last  inequality  follows  from  Eq.  7-8  and  is  valid  for  e  sufficiently 
small.  From  this  we  immediately  see  that 


sup  E [ q2 ( t ) ]  <  A34  g2(e) 
t>t0(c) 


(7-39) 


where  t0(e)  is  given  by  Eq .  7-7 
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Now  we  turn  our  attention  to  p(t).  Without  loss  of  generality,  assume 
p(0)  “  0  (this  is  the  worst  case  coresponding  to  no  prior  information).  Then 
from  Eq.  7-5a 


* 


p(t)  -  /  e  r(t-T)K(T)dT  +  I  e  r(t  T)N(T)dw(t) 

0  S  £  0 


(7-40) 


First  we  look  at  the  mean  of  p(t): 


s 


E[ u(t) ]  =  /  e  r(t_T)E[K(T)]dT 


Then 


(7-41) 


|E[ 


W(t)]|  <  Je  r(t“T) |E[K(t) ] | dx  <  /te"F(t~T){E[K2(T)]}1/2dt 


(7-42) 


£} 


Then,  using  Eq.  7-26  and  the  fact  that  for  xj  >  0 


M  M 

(  l  *i)1/2  <  I 

i=l  i=l 


(7-43) 


t 


V 


we  find  that  for  e  sufficiently  small 


|E[p(t)]|  <  /  e  r(t  T) [Bl  e  AT/e  +  B2  g(e)]dT 


-rt 


V—  fl-e^)tw!ii^fl-e-rt1 


A  -  e  r 


(7-44) 


<  B3  g(e) 

where  the  are  positive  constants  and  again  we  have  used  Eq.  7-8  in  the 
last  step. 
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Next  we  examine  E[p2(t)J.  To  do  this,  let 


x  =  /  e"r(t'T)K(T)dT 
0 

g(e)  c  -rfr-T'* 
y  =  —  ■  -  -  /  e  ^N(-r)dw(r) 

/  e  0 


(7-45a) 

(7-45b) 


Note  also  that  from  the  analysis  in  Eqs.  7-22  -  7-27  together  with  Eq.  7-43, 
we  can  deduce  that 


{E(x2)}1/2 


{  A!  E[6i2(0)  +  622(0)]e  e  f 

<  <  -  +  g2(c)  V 

I  2  A  V 

<  B4[612(0)  +  622(°)  1 1/2  /T“  e"rt/2  +  B5  g( 


1/2 


e) 


(7-46) 


Next,  let  us  obtain  an  upper  bound  on  E(y2).  Specifically,  using  Eqs.  7-12 
and  7-17 


E(y2)  -  /  e~2r(t"T)E[N(T)2]dT 

e  n 


K2  g2(e) 
2  r  c 


A  B6 


g2(  e ) 
e 


(7-47) 


Now, 
E[ u2( t )  ] 


since  p(t)  *  x+y,  we  can  obtain  the  lower  bound 
-  E((x+y)2] 

>  E(y2)  -  2{E(x2)}1/2  (E( y2) } 1/2 

_ g(c)  _  —  rt/2 

>  E(y2)  -  2/  B6  -  {B4  E[6!2(0)  +  622(0)]  Zee  +  B5  g(e) } 

/  e 


( 7-48) 


-Nr 
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We  now  proceed  to  find  a  lower  bound  on  E(y^).  To  begin,  note  that  for  t>l 


E(y2)  >  /  e"2r(t“T)E[N2(T)]dx 


Now  from  Eq.  7-6d 


(7-49) 


N(t)  -  C0(t)  +  Ci(t)  y(t)  +  C2  W2(t) 


where 


(7-50) 


C0(t)  *  [  ( f  L  -  fR)  (1-ip)  -  A(«i(t)  +  «2(c  >>]  ♦  +  A6x(t) 


CL(t)  =  (fL  -  fR)  (1-2*)  -  At6L(t)  +  «2(t)l 


(7-51a) 


( 7— 5  lb ) 


fR  “  fL  • 


( 7-5 lc ) 


Substituting  Eq.  7-50  Into  Eq.  7-49  and  discarding  several  nonnegative  terms, 


we  obtain 


Ely2]  >  g--c)  J  e_2r(t  t)E{Co2(t)  +  2  C0(t)  (^(t)  y(  t) 
€  0 

+  2  C0( i )  C2  y2( t )  +  2  (^(x)  C2  n3(t)}dT  . 


(7-52) 


From  Eq.  7-51a 


C(j2(t)  “  (f^  -  fR)2  (l-*)2  *2  +  2  ( f  l  “  fR)  (l-*)  *A[6j(t)  -  *  6  j  ( t )  *  62(t)J 

+  A  2 1  6  x  ( t )  -  ii  6  j  (  t )  +  *  6  2  (  t )  J  2 


>  B7  -  Bg  ;  5 1  ( t )  -  *  6  x  ( t )  -  *  6  2  ( t )  | 


(7-53) 


where  By  (  f  L  -  fR)2(l-*)2  *2  and  Bg  =  2|(fL  -  fR)  (l-*)*A|  are  positive 


■  o  ns  t  ant  s .  Thus 
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E [ C02 ( t ) ]  >  B7  -  B8  E[ | 6i€t)  -  6x(t)  -  <J,  52(t)|] 

>  B7  -  B8{E[6!(t)  -  *  Si(t)  -  ty  «2( 11  > ) 2  }1/2 

(7-54) 

>  B7  -  B8{B9  E[6i2(0)  +  622(0)]1/2  e-At/e 
+  B^q  e1/2  e“At/2e  +  B^  g(e)} 

where  the  last  Inequality  is  obtained  in  the  same  manner  as  Eq.  7-26  followed 
by  an  application  of  Eq.  7-43.  Note  that  is  we  define 

if  6l2(0)  +  622( 0)  =  0 

(7-55) 

otherwise 


Then 


ti(e) 


-  e  in  g(e) 


E [ C02( t ) ]  >  B12  >  0  for  t  >  ti(e)  . 


(7-56) 


Note  that  for  e  small  enough  t^(s)  =  tg(e)  =  0  or  t^(e)  «  tg(e).  Now,  from 
Eqs.  7-52  and  7-56  we  have  that  for  t  >  max(ti(t),l) 


E  [  y 2  ]  >  B12  ~  2  I  E{C0(t)C1(t)u(t)  +  C0(t)C2u2(t) 


t-l 


(7-57) 


+  C1(t)C2u3(t)}<1t 


Since  Cq( t )  and  C^(t)  are  bounded  and  ju(t)j  <  1 ,  we  deduce  that 


2  2  t 

E(y2)  >  B12  S-lL>  -  g13  8-1*1  ’  j  |Elu(T>]  :  +  E[u2(t)]  }dx  .  (7-58) 

£  £  t-l 


Then  from  Eq.  7-44  we  have  that  tor  t  *  max(  t  [  (  t ) ,  l ) 
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E(y2)  >  b12  gZ^y  -  B3  b13  -  b13  j  E[W2(T)]dT 

e  e  e  t-1 

>  BU  -  B13  J  E[u2(x)]dT 

£  £  t-1 


where  B14  >  0  and  e  is  sufficiently  small. 

Combining  Eqs.  7-59  and  7-48,  we  have  that  for  t  >  max(t3(e),l) 


E[y2(t)]  >  bu  giLsl  _  Bi3  j  E[y2(T)]dT 

£  £  t-1 

-  B15  E[6!2(0)  +  622(  0)  ]  g(e)  e~Tt/2  -  B16  . 

/  e 


Then  for  t  >  max( tQ( e) , 1)  and  e  sufficiently  small 


E(u2(t)]  >  B17  -  B13  ^[^(Oldt 

e  e  t-1 


where  B^7  >  0. 

Now,  let  T  be  any  time  >  0  and  define 


N  =  sup  E{ y2( t )  ]  . 

t  >T 


Then,  using  Eq.  7-61  we  obtain 


N  >  sup  E[ y2( t )  J 

t >max  (to(e),T+l) 


>  B17  Sli.il  -  b13  N  ^2(-— 

e  c 


from  which  we  deduce  that 
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* 


S 

%* 


w 


3c 

«r. 


v. 


with  B^g  >  0  and  for  c  sufficiently  small.  Combining  Eqs.  7-39  and  7-64  com¬ 
pletes  the  proof  of  the  theorem. 

In  the  next  section  we  present  simulation  results  that  corroborate  this 
result  and  that  in  fact  suggest  several  other,  stronger  results.  Several  con¬ 
jectures  are  presented  in  the  conclusions. 
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SECTION  8 

MONTE  CARLO  ANALYSIS  OF  FILTERING  PROBLEMS 

In  this  section,  the  simulation  of  the  filtering  techniques  proposed  in 
the  previous  sections  are  presented.  Both  the  techniques  used  to  simulate  the 
processes  of  interest  and  the  filters  themselves  are  described,  followed  by 
the  results.  The  goals  of  the  simulations  were  two-fold.  First  they  were 
intended  to  provide  support  for  results  and  conjectures  regarding  the  quality 
of  various  filtering  approximation  schemes.  Secondly,  they  were  intended  to 
clarify  the  types  of  phenomena  that  can  occur  in  filtering  problems  involving 
processes  with  various  time-scale  separations,  noise  levels  and  information 
rates.  As  we  will  see,  the  results  obtained  from  the  simuations  support  the 
results  of  the  previous  section  as  well  as  suggesting  several  additional 

i 

conjectures.  ! 

J  i 

i 

All  simulations  were  carried  out  on  a  PC's  Unlimited  286  personal  j 

i 

computer.  Simulation  routines  were  written  using  Turbo  Pascal.  i 

i 

8 . 1  Simulation  Techniques 

Before  discussing  specific  simulations  we  describe  the  techniques  required 
to  simulate  both  the  processes  of  interest  and  the  filters  themselves.  The 
techniques  of  interest  relate  to  the  simulation  of  Wiener  Processes,  Markov 
Processes  and  stochastic  differential  equations. 


I 

i 

i 

i 

i 
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8.1.1  Noise  Processes 

In  the  simulations  we  need  to  generate  increments  of  Brownian  motion 
processes.  These  increments  were  generated  by  manipulating  samples  obtained 
from  a  uniform  random  number  generator.  The  sample  values  were  drawn  from  a 
uniform  distribution  on  [0,1] ,  summed  and  scaled  to  approximate  a  value  drawn 
from  a  Gaussian  distribution.  The  scaling  factor  Q  was  given  by 


(8-1) 


where  I  is  the  intensity  of  the  noise  process,  At  is  the  time  increment 
between  sample  times  in  the  simulation,  and  N  is  the  number  of  values  drawn 
from  a  uniform  distribution  in  order  to  generate  a  simple,  approximately 
Gaussian  random  variable.  A  nominal  value  of  N  =  10  was  chosen  for  these 
s imulations . 


8.1.2  Markov  Processes 

For  each  of  the  filtering  simulations  described  in  Section  6,  a  sample 
path  for  the  4-state  Markov  Process  of  Fig.  6-1  is  required.  ~he  generation 
of  a  sample  path  for  a  general  continuous-time  Markov  process  can  be  accom¬ 
plished  by  generating  the  sequence  of  successive  states  that  are  visited  and 
the  time  intervals  between  the  transitions  from  state-to-state  in  the  sequence. 
Specifically,  let  Xfj  be  the  state  after  the  N-th  jump  and  t^j  the  time  of  the 
N-th  jump,  we  can  define  tn  to  be  the  N-th  holding  time  for  the  system  and  p-y 
the  conditional  jump  probabilities  where: 

TN  =  CN  "  CN-1  (8-2) 

Pij  =  Pr(xN=j  I  XN-1  =  1  >  (8-3) 
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The  generation  of  the  sample  path  then  proceeds  as  follows.  Suppose  that  we 
have  generated  xjj-i  =  i  at  time  t^-i .  Then  we  generate  two  conditionally 
independent  random  variables  (conditional  on  xpj-i  =  i),  namely  x^  and  tn* 

The  required  density  for  is  gi /en  by 


where 


and 


P(tn  |  xN  =  i)  =  ATi  exp{-ATi  tn}  (8-4) 


A^j  =  the  transition  rate  from  state  i  to  state  j, 

^Tj  =  I  ^ij  •  (8-5) 


The  technique  used  to  generate  the  required  random  variables  was  to  gen¬ 
erate  a  value  z'  from  a  uniform  distribution  on  [0,1] .  A  new  value  z  is  then 
calculated  as 


so  that 


-  A"1  in  (z’) 
Ti 


pz(z)  =  ATi  exp {-  At.  z} 


(8-6) 


(8-7) 


To  decide  on  the  order  of  states  entered  for  the  sample  path,  another 
variable  was  drawn  from  a  uniform  distribution  on  [0,1]  .  A  series  of 
thresholds  were  then  calculated  as 


Y0  =  0  (8-8a) 

N-l  Aik 

Yk  =  l  —  .  (8-8b) 

k=0  XT. 

,  .  1 
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The  state  after  the  N-th  jump  is  then  obtained  as 


*N  =  J  ^  *j-l  <  V  <  , 


(8-9) 


which  corresponds  to 


(8-10) 


0  k=i 


8.2  SIMULATION  OF  DIFFERENTIAL  EQUATIONS 

In  order  to  simulate  the  performance  of  the  various  filtering  approaches, 
it  is  necessary  to  integrate  the  differential  equations  describing  the  filters. 
We  consider  two  cases: 

1.  Filtering  with  noise  present  (Stochastic  Differential  Equations) 

2.  Filtering  in  the  noiseless  environment 


8.2.1  Simulation  in  a  Noisy  Environment 


In  the  cases  where  we  observe  a  process  with  additive  noise,  we  generally 
obtain  filtering  equations  of  the  form 


dp  =  F(p)dt  +  G(p)dv 


(8-11) 


where  £  is  a  vector  of  probabilities,  v  is  an  innovations  process  with  unit 
intensity,  and  F  and  G  are  column  vectors  which  we  denote  by 


fl(p) 


fn(£) 


81  (P) 


(8-12) 


8n(P) 


1  *  n  r* 
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In  order  to  simulate  the  filter  (and  the  process  of  interest)  on  a  digi¬ 
tal  computer,  it  is  necessary  to  discretize  time  into  segments  of  length  At 
and  propagate  the  equations  over  each  interval.  The  method  that  was  chosen  to 
propagate  the  differential  equation  over  each  interval  is  a  Runge-Kutta  scheme 
adapted  to  the  problem  of  nonlinear  stochastic  differential  equations  [8] . 

This  technique  integrates  a  differential  equation  from  time  t^  to  time 
CN+1  =■  tfj  +  At  as  follows: 


£'(tN+l)  =  £(tN)  +  £(p(tN))At  +  ^(£Icn))  Av 

/F(£(tN))  +  FCp'Un+O)  N 
£(tN+i)  =  p(tN)  +  - ; -  At 


(8-13a) 


(8-13b) 


^  G(£(  tjj))  + 


G(£'(ti^i)) 


where  F  is  a  modified  function  required  so  that  our  simulation  approximates 
Eq.  8-11.  Specifically,  it  is  known  that  as  At+O,  the  solution  of  Eq.  8-13 
converges  to  the  solution  of  the  differential  equation 


dp( t )  =  F  + 


1  3<i(£) 

- G(p) 

2  3£  J 


dt  +  G(p)dv  =  F  dt  +  G  dv 


(8-14) 


We  therefore  require  that 


1  3G(£) 

F(p)  =  F(p) - G  (p) 

2  3p 


(8-15) 


whe  re 
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Therefore,  the  matrix  of  expressions  F  that  we  use  in  Eq.  8-13,  differs 
from  the  matrix  j?  in  the  equation  describing  the  process  we  wish  to  simulate 
We  proceed  by  providing  the  equations  that  were  used  to  simulate  the  various 
filters  conjectured  for  the  noise-corrupted  observation  model. 


I.  EXACT  OPTIMAL  NONLINEAR  FILTER 

The  original  equations  were  defined  in  Eqs.  6-8  and  6-9  as 

dpL(t)  =  e[ Y2  "  (Yl+Y2)PL(t)]dt  +  e[“hl  6i(t)  +  n2  $2(t)]dt 
+  g(e)  [ f l  “  h(t)]pL(t)dv(t)  +  g(e)  A6i(t)  dv(t) 

d«i(t)  =  -Ai6^(t)dt  +  £[ri5  -  n36l(t)  +  (n4  ~  n5)pL(t)  +  06  <52(t)ldt 
+  g(e)[5l  PL(C)  +  ?2  dl(t)  ”  h(t)  dl(t)]dv(t) 

d62(t)  =  -A2  62(t)dt  +  e  I  rig  -  04  62(0  +  (nio  “  h8>PL(t)  +  ng  6i(t)]dt 
+  g( e) [ 53  “  53  PL(t)  +  54  d2(c)  “  h(t)  ^2Ct>  )ci vCt )  • 


Casting  this  in  the  format  of  Eq.  8-14  we  obtain 
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F(p)  = 


:  { Y2  ~  (yi+Y2)PL(c)  +  12  <52(t)  ~  11  ^(t)] 

C Ai  +  c  n3)6i(t)  +  e[(n4  "  H5>PL(t)  +  05  +  %  <$2(t) 
(A2  +  e  H7)62(t)  +  e[(nio  _n8)PL(t)  +  %  +  19  61(c)] 


(8-17a) 


fL  ”  h(t))pL(t)  +  A6i(t) 
G(£)  =  g(e)  Cl  PL(t)  +  (C2  "  h)<5l 

C3(l  -  pL(t))  +  (C4  -  h)62 


(8-17b) 


3G(p) 


(fL  “  h)  -  (fL-fR)pL(t)  A(1  -  pL(t)) 


C2  -  (h  +  A6i(t)) 
-A62(t) 


*  g(e)  Cl  “  (fL_fR)6l(t) 
-C3  -  (fL-fR)62(t) 


-ApL(t) 

~A6i(t) 

A 

C4  -  (h  +  A62(t)) 


(8-18) 

1  3G(p) 

and  letting  ]?  =  _F(p) - -  G(j>),  we  can  apply  the  propagation  equation 

2  3p 

given  by  Eq.  8-14.  The  increment  of  time,  At,  was  restricted  to  be,  at  most, 
10  percent  of  the  mean  time  between  the  fastest  transitions  of  the  Markov 
Chain. 


H .  TWO-STATE  APPROXIMATE  FILTER 

This  filter  was  described  in  subsection  6.1  and  satisfies  Eqs.  6-16 
and  6-14.  It  was  simulated  in  exactly  the  same  manner  as  the  exact  filter 
described  above  except  that  the  calculations  are  simplified  because  6i(t), 
6 2 ( t )  are  assumed  to  be  zero.  We  therefore  obtain 


116 


ALPHATECH,  INC 


f(pl(0> 

=  e(Y2 

~(  Yl+Y2)PL(t)> 

(8- 19a) 

G( PL( t ) ) 

=  g(e) 

PL(0  <l  "  PL<0) 

( 8— 1 9  b ) 

—  G( pL( t )  ) 

=  g(e) 

(f,.-fR)  (1  -  2  PL(t» 

( 8-  19c ) 

F(PlU))  =  ^2  ~  (Yl+T2)  Pl(O) 

1  o 


2(e)  (fL-fR)2  pL(t)  (1  -  PL(t))  (1  -  2  pL(t))  . 


(8-19d) 


Using  these  equations,  Eq.  8-14  was  applied  once  again. 


III.  FRONT-END/ BACK-END  PROCESSOR 

Simulations  were  performed  for  the  front-end/back-end  processor  for  which 
the  front  end  calculated  the  likelihood  ratio,  L(t)  using  the  equations  given 
by  Eq.  6-26,  with  dvR,  dv^  given  by  Eq.  6-25b.  Once  again,  Eq.  6-26  is  a  sto¬ 
chastic  differential  equation  driven  by  white  noise  and  requires  a  corrected 
drift  term  for  the  Runge  Kutta  technique.  If  we  define  p( t )  as 


L(t) 

p(t)  “  Qlt(c) 


Qrt( t ) 


(8-20) 


where  L(t),  Qlt>  Qrt  are  defined  as  in  Eq.  6-26,  we  can  use  our  previous 
formulation  to  simulate 
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dp(t)  =  F(£(t))dt  +  G(£(t)) 


d  t ) 


where 


F(£)  -  \2  ~  (  X  2 )  QltCO 

>4  “  ( X3+X4)  Qrt( t ) 


(8-2  la) 


G(£( t ) )  -  Cr(£)  -  GL(£)  -  g(e)(hL(t)-hR(t))  L(t)  .  0 

0  .  g(e)  (a  -  hL<t))  QlT  . 

g(e)  (a  -  hR(t))  Qrx  •  0 

( 8—2 lb ) 

with  G  segmented  into  two  column  vectors,  one  associated  with  each  of  the  two 
processes  v^,  vr.  Note  that  vl  and  vr  are  not  innovations  processes.  Rather 
we  have  that 


d vr( t )  dy(t)  -  g(e)  hL(t)dt 

dvL(t)  dy(t)  -  g(e)  hR(t)dt 


(8-22) 


whe  re 


^l( t )  -  S  +  (a-8)  QltCi) 
hR(t)  =  8  +  (a-8)  Qrx(c) 


(8-23a) 

(8-23b) 


00 
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and  the  true  innovations  is  given  by 

dv(t)  =  dy(t  )  -  g( e  )  h(t)dt 


Since  dvg  and  dv^  differs  from  dv  only  by  a  drift  term,  we  can  simply  obtain 
the  correction  terms  corresponding  to  vr  and  vp  and  add  them.  Specifically 


i 


(hL(t)  -  hR(t)) 
0 
0 


g(  t  )(a-B)U  t ) 
0 
0 


-g(e)(a~6)L(t) 

0 

g(e)(o-e)(l-2QRT(t)) 

(8-24a) 


f 


acL 


o  o 

0  g(e)(a-B)(l-2QLT(t)) 

0  0 


0 

0 


0 


(8-24b) 


Finally,  we  obtain  F(£)  for  simulation  purposes  by  summing  the  correction 
terms  associated  with  dv^  and  dvR  to  obtain: 


-Y2  g2(e)  (a-6)2  (QLT(t)  -  QrT( t ) )  ( 1-2Qrt( t ) )  L(t) 


F(p) 


X2  _  (*i+A2)  Qlt(  c  ) - g^(e)  QlT^)  t )  ) 

*4  "  (A3+A4)  QRj(t) - y~  g2( e )  Qrx( t )  (l-QRj(t))  ( l~2QRy( t ) ) 


Once  again,  the  increment  of  time.  At,  was  small  (at  most  10  percent) 
compared  to  the  mean  time  between  the  fastest  transitions  of  the  Markov  Chain. 
The  initial  values  of  L,  Q^t  and  QrT  were  chosen  according  to  Eq.  6-30. 


Once  a  value  was  obtained  for  L(t)  on  each  interval,  that  value  was  used  by 
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another  routine  which  performed  a  prediction  calculation  using  Eq .  6-54  and  a 
"measurement"  update  using  Eq.  6-37.  Both  of  these  calculations  involve  only 
straightforward  computations. 

IV .  FRONT-END/ BACK-END  PROCESSOR  WITH  AVERAGING  ASSUMPTION 

The  simulation  of  this  filter  is  very  straightforward,  using  Eqs.  6-46 
and  6-47  to  obtain  L(k)  on  each  interval.  A  fast  routine  simply  calculates 
the  integral  of  dy(t)  by  performing  a  summation  of  the  form: 

7  =  j  dy(T)  (8-25) 

kT(e)<T<(k+l)TU)  . 

The  "slow"  routine  was  then  used  to  propagate  the  probability  in  a  two-step 
fashion  with  a  prediction  step  and  an  update  step  as  in  III  above. 

8.2.2  Simulation  in  a  Noiseless  Environment 

In  general,  simulation  of  the  filtering  equations  for  the  noiseless  envi¬ 
ronment  is  simpler  than  in  the  noisy  environment  because  the  equations  are  no 
longer  driven  by  a  Wiener  process. 


r\ 

L 


I.  EXACT  PROBABILITY  EQUATIONS 

The  exact  equations  for  the  probabilities  of  being  in  the  left  or  right 
states  are  given  by  Eq .  6-49.  Equations  6-49a  and  6-49b  are  ordinary  differ¬ 
ential  equations  which  can  be  solved  using  a  simple  integration  routine.  In 
this  case  the  Runge-Kutta  technique  requires  the  calculation  of 

pL(t+At)  =  Pl( t )  +  pL(t)  •  At  (8-26a) 

1  .  JL 

pL(t+At)  =  pj(t)  +  -  fpL(t)  +  pL(t+At)]  At  .  (8-26b) 
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Iru-st  steps  were  implemented  with  time  Intervals  such  that  At  was  small  com¬ 
pared  to  the  time  between  the  fastest  transitions  in  the  Markov  chain.  The 
<  » .  .a  l.tt  ions  were  performed  between  jump  times  with  an  additional  integration 
t  r  t  :ie  residual  time  Atjg  just  preceding  each  jump: 


tN  "  CN-1 

AtN  =  US  "  tS-l)  - - At 


(8-27) 


[_  x  _j  denotes  the  largest  Integer  <  x 


At  jump  times,  Eqs.  6-49c  -  6-49d  were  used  to  update  the  probabilities, 


l 1 •  FRONT-END/ 8ACK-END  PROCESSOR  WITH  AK ,  ARr 

In  this  case,  the  processor  structure  in  which  ARt  and  AK,  the  increments 
in  the  upper  state  residence  time  and  the  total  number  of  counts  respectively 
are  determined  for  Individual  segments  of  time  by  a  fast  processor.  These 
values  are  then  used  by  a  slow  processor  to  calculate  state  probabilities. 

The  front-end  processor  determines  two  quantities,  R( t )  and  K( t )  where 


T 

R(tN)  “  R(tfl-l)  +  (tfl  -  t^-))  x(tjg-l) 


(8-28a) 


K(tN)  =  K(  t  jq—  1  )  +  I  x(  t  isi)  ~  x(tN-l)l 


(8-28b) 


x(  tg) 


^  0  If  p(t)  is  in  state  2  or  4  at  time  tg  . 
^  1  if  p(t)  is  in  state  1  or  3  at  time  tg  . 


Out  e  t  tie  se  quantities  are  obtained  for  a  time  interval  [RT(e),  (k+l)T(t)), 
r hi- v  trc  used  by  the  slow  processor  to  update  probabilities  using  the 
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di t terent Ial  equation  given  by  Eq .  6-56.  In  this  case  we  must  be  careful  when 
integrating  the  equation  because  of  the  dK(t)  driving  term  which  represents 
increments  in  a  jump  process.  The  equations  that  we  would  normally  implement 
ar<  in  the  standard  Runge-Kutta  format 


Pl(t+At)  =  pL(t)  +  F  ^  (  Pl( t ) ) ARt  +  F2(pL(t))At  +  G( pL( t ) ) AD  +  G2(pL(t)AU 

(8-29) 

using  the  terminology  of  Eq .  6-59  to  rewrite  Eq .  6-56.  The  second  step  of  the 
integration  is  per formed  using 

ARt 

PjJt-t-At)  =  pl(  t )  +  (Fi(pL(t)  +  Fj  (  Pl(  t+ At  )  ) 


-  At 

+  !F2(PL(t))  +  f2( Pl( t+At)) )  — 

_  AD 

+  +  Gi(pL(t+At))  ]  — 

+  f  G2 ( Pl( c  > )  +  G2(PL(t+6t>) 1  ~ 


(8-30) 


Let  us  analyze  the  error  introduced  by  the  use  of  Eqs.  8-29  and  8-30.  First, 
it  we  perform  Taylor  series  expansions  of  F^ ( p^( t+At ) ) ,  etc.,  in  Eq .  8-30  and 
keep  only  terms  of  zeroth  and  first  order,  we  have 


pj_(t+At)  ?  pi  (  t  )  +  Fj  (  pL(  t  )  )  ARt  +  F2  (  Pl(  C  ) )  At  +  G^  (  pl(  t  )  )  AD  +  G2(pL(t))AU 
+  --  -  |  F|(  pL(t  ))ARt  +  F^(  PL(  t  )  )  At  +  G  j  (  pL(  t  )  )  AD  +  G^(  Pl.(  t  )  )  AU  ]  . 

El(p[(t))ARt  F2(pL(t))At  +  Gj(pL(t))AD  +  G2  (  p|(  t  )  )  Al' 1 
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Now,  recalling  the  definition  of  the  functions  F} ,  F2  ,  G\  and  G2, 


flCPLtO)  =  e[h2“PA  ~  (pi+U2“P3_P4)PL(t)] 
+  (a-8)g(e)pL(t)(l-pL(t)) 


(8-32a) 


f2(pl(c))  =  el U4  "  (p3+M4>Pl(c)] 

+  8  g(e)  PL(t)  (I-PlO1)) 


(8-32b) 


Gi(PL(0)  = 


-g(e)  apL(t)  (I-PLC1)) 
*1  +  ag(e)  (l-PL(t)) 


(8-32c) 


G2(pL(t))  = 


-g(e)  8pl( t )  (l-PL(t)) 
*2  +  8g(e)  (l-PL(t)) 


(.8-3  2d) 


and  substituting  into  Eq.  8-31  we  obtain 


Pl( t+At )  2  pl( t )  +  [(a-B)g(e)  Pl(1_Pl)  +  e[(p2“U4)  “  ( U1+P2“P3"P4)Pl( t ) 1 ] ARt 


+  [8g(e)  Pl<1_Pl)  +  £[P4  "  (v3+vOpl(  OJ  ]At 


ag(c)  pl(  t )  (l-pL(t))  8g(  e)  pL(t)  (l-Pl/O) 

-  AD - AU 

*1  +  ag( e)  (l-PL(t))  ^2  +  Bg(e)  (l-pL(t)) 


+  PL(f)  (l-PLCt))  (l-2pL(t)) 


g2(  e) 


(a-B)2  AR2  +  82  At2  +  —  AD2  +  —  AU2 
t  %2  \2 

1  2 


a(a-8) 

+  2  8(a-6)At  ARt - ARt  AD 

*1 


BU-B)  aB  Bz 

-  ARt  AU - At  AD  -  —  At  AU 

\2  \\  \2 


aS 

+  — —  AD  AU 

XiA2 


(8-33) 
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To  compare  Eq.  8-33  to  the  exact  equations,  consider  the  relationship  between 
Apl  and  AL  for  small  AL.  Substituting  Eq.  8-34  into  Eq.  6-65,  we  obtain  Eq. 
8-35  and  finally  Eq.  8-36 

L( t+At )  =  L(t)  +  AL  .  (8-34) 

-  ApL( t ) 

AL  =  -  +  0((AL)2)  (8-35) 

pL(t)  (l-pL(t)) 

AL  *  -  0At  -  (a-0)ARt  +  ~  AD  +  AU]  g(e)  +  0(g2(e))  .  (8-36) 

Expanding  the  exact  expression  for  L,  (Eq.  6-63),  using  a  Taylor  expansion  for 
the  exponential  and  Binomial  theorem  for  the  power  factors,  we  obtain  exactly 
Eq.  8-36. 

Therefore  we  see  that  the  basic  Runge-Kutta  technique  will  be  accurate  to 
within  0(g2(e)) . 

III.  FRONT-END/ BACK-END  PROCESSOR  USING  LIKELIHOOD  FUNCTION 

Simulations  were  performed  for  the  front-end/back-end  filter  structures 
for  which  L(t),  the  likelihood  ratio  described  by  Eq.  6-64  is  calculated  using 
the  equation  given  by  Eq.  6-63.  The  calculation  of  L(t)  is  straightforward 
and  performed  over  time  increments  of  length  T(e),  based  upon  the  ARt,  AD,  AU 
statistics  determined  by  the  fast  integrator  and  counters  of  the  "front  end." 
These  statistics  were  determined  from  the  sample  path  in  the  same  manner  as 
described  for  II  above.  The  values  of  L(t)  were  calculated  and  passed  to  a 
"slow  routine"  which  calculated  the  state  probabilities  using  a  two-stage 
process  comprised  of  prediction  and  update  calculations. 
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The  prediction  step  was  implemented  using  the  "approximation  technique" 
given  by  Eqs.  6-76  and  6-77.  For  the  choice  of  y's  that  was  used,  however 
(Ui=l),  the  condition  of  Eq.  6-74  was  satisfied  so  that  Eqs.  2-76  and  2-77 
were  exact.  The  update  calculation  was  done  using  Eq.  6-78. 

8.3  SIMULATION  RESULTS 

8.3.1  Probability  Propagation  for  White  Noise  Model 

In  subsection  8.2,  the  simulations  for  four  filtering  techniques  in  the 
noisy  observation  case  were  described.  The  results  for  each  of  these  cases 
are  provided  in  Figs.  8-1  through  8-14  and  are  discussed  below.  The  plots 
of  the  filter  outputs  show  two  pieces  of  information.  The  "square  wave"  plot 
shows  the  sample  path  of  the  system  in  terms  of  lef t-to-right  transitions. 

A  high  value  in  the  plot  (.75)  indicates  the  system  is  actually  in  the  left 
pair  of  states  and  a  low  value  (.25)  indicates  the  system  is  actually  on  the 
right.  The  probability  of  being  in  the  left  pair  of  states  as  determined  by 
the  filter  is  superimposed  on  this  plot.  Since  a  large  value  of  pl  represents 
a  conclusion  that  we  are  on  the  left  and  a  small  p^  that  we  are  on  the  right, 
an  indication  of  filter  performance  is  the  amount  of  time  that  the  filter 
output  and  the  sample  path  are  both  in  the  top  or  bottom  of  the  plot.  This 
provides  a  qualitative  measure  of  how  often  the  filter  reaches  the  correct 
"conclusion. " 

I.  EXACT  FILTER 

The  simulations  for  this  and  all  other  white  noise  cases  used  the  param¬ 
eter  values  given  in  Eq.  8-37 
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Xj_  =  X4  =  1  (8-37a) 
X2  =  A3  =  5  ( 8—  37b) 
Ml  =  U2  ~  u3  =  w 4  =  1  (8-37c) 
AC  =  0.1  (8-37d) 

e  =  0.01  (8-37e) 


where  AC  is  the  time  increment  between  calculations. 

The  results  for  four  simulations  are  provided  for  the  exact  filter  case. 
Each  simulation  was  done  with  an  identical  sample  path,  but  with  different 
values  for  g(e)  the  signal  magnitude  in  the  output.  With  a  noise  intensity 
of  1,  Figs.  8-1  through  8-4  provide  results  for  g(e)  =  1.00,  0.30,  0.10,  and 
0.03  corresponding  to  c® ,  and  e^M,  respectively. 

Several  features  of  the  results  are  worth  noting.  The  filter  appears  to 
display  a  "switch"  type  of  behavior  for  larger  values  of  g(e),  particularly 
g(e)  =  1.  For  g(e)  =  performance  deterioriates  somewhat,  but  still 

provides  a  correct  result  in  the  sense  that  p >  0.5  for  the  majority  of  time 
that  the  system  is  in  the  left  pair  of  states  and  p^  <  0.5  when  the  process 
is  on  the  right.  When  the  signal  strength  is  decreased  to  g(  e)  =  per¬ 

formance  becomes  somewhat  worse,  with  smaller  excursions  away  from  the  uncon¬ 
ditional  probability  value  of  0.5  and  with  occasional  excursions  to  the 
incorrect  side  of  pl  =  0.5,  without  a  transition  having  occurred.  Finally, 
when  g(e)  =  performance  breaks  down  substantially.  Although  the  proba¬ 
bilities  move  in  the  same  general  direction  as  in  the  case,  excursions 

from  pl  =  0.5  are  quite  small,  and  bursts  of  noise  occasionally  generate  the 
incorrect  conclusion  that  a  transition  has  occurred. 
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I I .  AGGREGATE  PROBABILITIES  FOR  WHITE  NOISE  MODEL 

The  output  of  the  aggregate  filter  described  in  Section  6  is  presented 
next  for  the  case  of  t  =  0.01,  g(e)  =  0.1,  0.3  and  1.0  (Figs.  8-5  through  8-7) 
The  results  of  Section  7  indicate  that  for  g(e)  =  0(e^^),  the  performance  of 
this  filter  should  be  asymptotically  close  to  that  of  the  optimal  filter  as 
e  decreases  to  0.  Comparison  of  the  simulation  outputs  of  the  aggregate  and 
exact  filters  shows  almost  exact  agreement  for  the  cases  of  g(  e)  =  0.1  (e^^) 
and  g(e)  =  0.3  In  the  case  of  g(e)  =  1,  agreement  was  still  good,  but 

deviations  of  up  to  0.05  in  magnitude  can  be  found.  In  all  cases,  however, 
the  difference  between  the  aggregate  and  the  exact  versions  of  the  filter  was 
of  much  smaller  order  than  the  bound  of  0(g^(e))  derived  in  Section  7. 

III.  FE/BE  SIMULATION  FOR  WHITE  NOISE  MODEL 

The  FE/BE  model  refers  to  the  case  when  a  sufficient  statistic,  L,  is 
calculated  in  batches  by  a  front  end  with  no  apriori  information.  The  back 
end  then  applies  a  Bayesian  update  procedure  using  successive  values  of  L. 

The  simulations  were  done  with  g  =  0.3  and  z  =  .01;  a  case  in  which  the  opti¬ 
mal  filter  performs  well,  but  does  not  yet  act  like  a  "switch."  The  time 
intervals,  T(e)  selected  for  the  collection  of  data  were  1,  3,  10  and  31  cor¬ 
responding  to  z® ,  z~^^ ,  and  z~^^ ,  respectively.  The  plots  for  these 

cases  are  provided  in  Figs.  8-8  through  8-11. 

The  simulations  show  that  for  relatively  small  time  intervals,  namely 
T(c)  =  1  and  T(e)  =  3,  performance  is  excellent,  as  the  probabilities  gener¬ 
ated,  differ  from  the  optimal  values  by  at  most  .01  for  this  sample  path.  It 
is  our  conjecture  that  for  g(c)T(e)  <  1,  excellent  performance  can  be  expected 


with  deteriorating  agreement  between  the  exact  filter  and  this  approximation 
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when  g(e)T(e)  exceeds  1.  The  values  of  this  product  for  T(e)  =  1  and  3  are 
both  less  than  1,  and  we  do  in  fact  see  very  good  performance  in  this  case. 
When  we  reach  T(e)  =  10,  we  have  that  g(e)T(e)  =  3  >  1  and  therefore  expect 
that  deviations  between  the  FE/BE  filter  and  the  optimal  filter  will  start 
to  increase.  This  is  indeed  the  case;  comparing  the  FE/BE  with  ten  time  unit 
batches  to  the  exact  filter  for  this  sample  path  shows  a  maximum  deviation 
of  .04  while  for  T(e)  =  31,  in  which  case  g(e)T(e)  =  9.3,  we  have  a  maximum 
deviation  of  .08,  with  values  consistently  biased  towards  the  0  or  1  limits. 
It  is  worth  noting  that  although  the  probabilities  generated  by  the  filter 
differ  slightly,  the  final  conclusions  regarding  the  current  state  of  the 
system  seldom  differ. 

IV .  FE/BE  SIMULATION  FOR  AGGREGATE  WHITE  NOISE  MODEL 

The  final  filter  approximation,  actually  a  combination  of  II  and  III, 
was  simulated  with  the  same  parameter  values  as  III,  e  =  .01,  g(e)  =  .3.  The 

batch  times,  T(e),  were  selected  to  be  3  and  10  (Fig.  8-13).  The  results  are 

very  similar  to  those  in  case  III,  showing  excellent  agreement  with  the  exact 
filter  for  T(e)g(e)  =  1  (within  .01)  and  somewhat  larger  differences  (maximum 
.06)  for  T(e)  =  10  which  yields  g(e)T(e)  =3.  A  final  run  is  provided  in  Fig 

8-14  in  which  g(e)  was  decreases  to  .1  while  T(e)  remained  at  10.  In  this 

case,  g(e)T(e)  =  1  so  we  again  expect  reasonable  agreement  with  the  optimal. 
Comparing  Figs.  8-14  and  8-3  we  find  that  this  is  indeed  the  case. 


8.3.2  Quantitative  Aspects  of  White  Noise  Model  Simulations 


ALPHATECH,  INC 


the  aggregate  approximation.  In  particular,  the  quantities  of  interest  were 
measures  of  the  rate  at  which  information  is  supplied  to  the  filter  by  the 
measurements  and  in  the  case  of  the  approximate  filter,  the  difference  between 
the  approximate  output  and  the  optimal  output.  The  first  quantity,  the  infor¬ 
mation,  is  defined  in  Section  7  as  p(t)  and  is  the  magnitude  of  the  difference 
between  the  filter  output  and  the  output  that  would  appear  in  steady-state  if 
no  measurements  were  available.  For  our  four-state  model,  if  we  calculate  the 
probability  of  being  in  the  left  pair  of  states  (p^(t))  then 


y ( t )  =  pL(t)  - 


Y1+Y2 


(8-38) 


In  Eq.  8-38,  Y1  and  y2  are  the  lef t-to-right  and  right-to-left  transition 
rates  assuming  the  fast  transitions  have  reached  steady-state  and  are  given  by 
equations  defined  in  subsection  6.2.1.  For  our  numerical  example,  Y1  =  Y2  =  1 
The  measures  of  these  quantities  that  we  are  interested  in  are  their  suprema 
and  their  mean  square  values. 


Parameter  Values: 

In  all  simulation  runs,  the  parameter  values  that  were  selected  are  given 
by  Eq.  8-37,  and  each  sample  paths  had  a  length  of  1000  time  units. 


Mean  Square  Results 

The  first  three  plots  (Figs.  8-15  through  8-17)  present  the  results  for 
the  mean  square  error  of  the  filter  based  on  the  aggregate  model  and  the 
information  available  in  the  "exact  model."  The  first  graph  provides  a  log- 
log  plot  of  the  mean  square  value  of  the  information  versus  the  order  of  e  in 
the  signal  power  expression.  These  values  are  given  by  Eqs.  8-39  and  8-40. 
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Mean  Square  Information 


Y2  A  2 

Y1+Y2  J 


(8-39) 


Power  of  epsilon  in  signal 


log(g( e)) 
log(e) 


(8-40) 


The  plot  shows  a  distinct  break  point  in  its  behavior  at  g(  e)  = 

Spec! f ica '  ly ,  for  g(e)  =  e^-,  k  <  1/2,  there  is  very  little  difference  in  the 
mean  square  information  rate  generated  by  changing  the  value  of  e  over  several 
orders  of  magnitude.  For  values  of  k  >  1/2,  there  is  a  clear  drop  in  the 
information  available  to  the  filter  as  e  decreases.  This  supports  our  anal¬ 
ysis  which  indicated  that  g(e)  =  e^/2  is  a  critical  dividing  point  in  the  per¬ 
formance  of  these  estimation  algorithms. 

The  second  graph  plots  the  logarithm  of  the  mean  square  error  against 
the  power  of  e  in  g(e).  In  this  case  we  do  not  see  the  threshold  effect  and 
the  plots  appear  linear  for  the  values  of  e  that  were  simulated.  When  the  two 
sets  of  data  are  combined  and  the  quotient  plotted  against  e,  we  see  a  change 
in  behavior  at  g(e)  =  For  g(e)  <  e^'^,  the  plots  are  linear,  which 

slope  1,  indicating  a  linear  relationship  with  e  as  predicted  in  Section  7. 

For  g(e)  >  the  error  is  much  larger  relative  to  the  available  informa¬ 

tion  and  there  is  no  evidence  that  the  ratio  of  error  to  information  will 
decrease  significantly  more.  This  is  again  in  agreement  with  the  theoretical 
results  of  Section  7. 


Supremum  Results: 

#>' 

An  identical  set  of  graphs  (Figs.  8-18  through  8-20)  was  prepared  for 

■v* 

estimates  of  the  suprema  of  the  information  and  the  approximation  error.  The 
.  estimates  were  calculated  using  Eqs.  8-41  and  8-42  for  the  simulation  data. 
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Figure  8-18.  Plots  of  sup 


for  Various  Orders  of  g( e) 
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ESup  =  “a*  PL(t)  -  pL(t) 


(8-42) 


where  pL(t)  and  p^(t)  are  defined  as  the  exact  and  approximate  probabilities 
respectively.  The  first  graph  provides  a  plot  of  log  (I)  versus  the  power  of 
e  in  g(e).  The  plot  exhibits  identical  characteristics  to  that  for  the  mean 
square  quantity.  Comparing  plots  of  maximum  error  and  the  quotient  of  error 
and  information,  we  find  that  they  are  also  very  similar.  The  last  plot,  the 
quotient  of  the  error  and  the  information,  exhibits  a  slope  of  1  as  in  the 
mean  square  case. 


8.3.3  Sample  Paths  and  Probabilities  for  the  Discrete  Measurement  Case 


The  discrete  measurement  simulations  were  run  with  three  different 
filters  as  described  in  subsection  8.2.  The  set  of  nominal  parameter  values 
that  were  chosen  for  the  simulation  are  given  in  Eq .  4-43. 


(8-4 3a) 


*1=1+  8(e) 


*2  =  ^3  =  *4  =  1 


*2  =  U  3  = 


(8-43b) 


(8-4 3c) 


(8-4 3d) 


(8-4 3e) 
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I .  EXACT  FILTER 

Simulations  of  the  exact  filter  were  run  for  g(c)  ranging  from  0.03  to 
3.0.  Prior  to  the  simulations  we  expected  very  poor  performance  for  gj^(e) 

<  marginal  performance  for  g(e)  =  e^/2  an(j  excellent  performance  for 

g(e)  >  e^/2.  The  simulations  are  in  fairly  close  agreement  with  these  con¬ 
jectures.  In  the  first  plot  (Fig.  8-21)  for  g^(e)  =  0.03,  the  filter  output 
appears  much  like  a  random  walk  about  the  p^,  =  0.5  line.  As  gi(e)  is  increased 

to  0.1,  or  in  Fig.  8-22,  performance  improves  with  the  filter  reaching 

the  correct  conclusion  regarding  the  correct  state  of  the  system  when  the  sys¬ 
tem  remains  in  that  state  for  times  which  are  0(e-^).  The  filter  has  little 

chance,  however,  of  detecting  transitions  to  the  left  or  right  if  the  duration 

of  such  a  sojourn  is  for  a  significantly  shorter  period  of  time.  As  the  value 
of  gi(e)  is  increased  to  .3,  l,  and  finally  3  (Figs.  8-23  through  8-25),  the 
filter  output  becomes  much  better,  and  begins  displaying  "switching  behavior." 
In  addition,  short  excursions  of  the  system  to  the  left  and  right  are  detected. 

II.  DIFFERENTIAL  EQUATION  APPROXIMATION  TO  DISCRETE  MEASUREMENT  FILTER 

The  differential  equation  approximations  to  the  optimal  filter  was  simu¬ 
lated  with  identical  nominal  parameter  values  as  in  I ,  but  with  g(e)  =  0.1  and 
time  intervals  for  integration  of  1,  10  and  50  seconds.  In  our  integration 
technique  for  this  case  (driven  by  a  jump  process),  second  order  effects  which 
are  0(g^(e)  AK^)  were  ignored.  Since  1  for  all  of  our  simulations 

0(AK)  =  0(T(e))  and  therefore  ignoring  the  second  order  terms  is  justified  for 
g(e)  T(e)  small.  In  our  simulations,  g(e)  was  0.1,  so  for  the  case  of  T( e)  =  1 
(Fig.  8-26),  the  differential  equation  should  provide  a  good  approximation,  for 


T(e)  =  10  (Fig.  8-27)  a  marginal  approximation,  and  for  T( e)  =  50  (Fig.  8-28) 
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Figure  8-22.  Exact  Probabilities  for  Discrete  Model,  g^(e) 
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Figure  8-27.  Differential  Equation  Approximation  with  Time  Increment  of  10 
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a  poor  approximation.  Examining  the  simulation  results  in  these  figures 
and  comparing  them  to  Fig.  8-22,  we  see  that  the  maximum  deviations  from  opti¬ 
mum  are  approximately  0.01,  0.04  and  0.12,  respectively,  confirming  our  expec¬ 
tations.  To  provide  further  support,  two  additional  simulation  results  are 
provided  both  with  T(e)  =  10,  but  in  the  first  case,  g( e)  =  0.03  for  which  we 
expect  good  performance,  and  in  the  second,  g^(e)  =  0.3,  for  which  we  expect 
poor  performance  (compared  to  optimal).  The  plots  of  the  simulations  are  pro¬ 
vided  (Figs.  8-29  and  8-30)  and  support  these  expectations,  with  maximum  devi¬ 
ations  of  0.005  and  0.150  for  these  sample  paths. 

III.  FE/BE  STRUCTURE  FOR  DISCRETE  MEASUREMENT  MODEL 

The  FE/BE  filter  structure  was  simulated  with  different  lengths  of  time 
used  for  the  batching  interval  at  the  front  end.  For  this  filter  we  expect 
good  performance  relative  to  optimum  for  cases  with  g( e)  T ( e)  <  1  and  poorer 
performance  when  g(e)  T(e)  >  1.  (Note:  This  is  only  a  rough  threshold,  a 
more  precise  condition  on  the  parameters  is  conjectured  in  subsection  8.3.4. 
Therefore,  since  a  value  of  G(e)  =  0.01  was  used  in  this  section,  we  expect 
reasonable  performance  for  T(e)  being  50  or  even  higher.  This  was  the  case 
in  the  simulations  with  maximum  deviations  of  .01,  .02,  .04  from  optimal  for 
T(e)  =  1,  10  and  31,  respectively  (Figs.  8-31  through  8-33).  In  the  final 
simulation  for  g( e )  =  0.1  (Fig.  8-34),  T(e)  was  set  to  75  and  therefore  we 
expect  that  we  will  see  marginal  performance  (eT(e)  =  0.75).  The  simulations 
showed  deviations  by  as  much  as  .08  from  optimal  but  in  general  agreement  was 
quite  good.  A  final  simulation  with  g( c )  =  0.3  and  T(e)  =  20  was  run  to  dem¬ 
onstrate  tile  case  where  absolute  filter  performance  is  fairly  good.  In  this 
rise  kT(i  )  was  0.2  and  relative  performance  was  very  good  as  well  (compare 
Fig.  8  -  5  >  to  Fig.  8-25;. 
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8.3.4  Quantitative  Simulations  for  FE/BE  Processor  in  a  Noiseless  Environment 

In  a  similar  manner  to  subsection  8.3.2,  a  number  of  runs  were  performed 
for  the  filter  which  approximated  the  optimal  filter  in  the  discrete  measure¬ 
ment  case  by  calculating  a  statistic  L(t)  in  a  batch  fasion  and  using  this  in 
a  slower  set  of  calculations.  The  results  are  provided  in  Figs.  8-36  to  8-45. 
The  first  two  graphs  (Figs.  8-36  and  8-37)  provide  log-log  plots  of  the  dif¬ 
ference  between  optimal  and  approximate  filters  versus  g(e)  for  the  value  of 
T(e),  the  time  between  batches,  varying  from  1  to  31  units.  The  value  of  e 
is  selected  to  be  0.1  in  the  first  case  and  0.001  in  the  second.  These  plots 
provide  evidence  that: 

1.  The  difference  between  approximation  and  optimal  is  linear 
in  g2(e). 

2.  For  T(e)  small  relative  to  ,  changes  in  T( e)  should  have 
little  effect  on  relative  filter  performance. 

Figure  8-38  and  8-39  confirm  the  second  point  by  displaying  approximation 

error  versus  T(e)  for  e  ranging  from  0.001  to  0.1.  Finally,  using  these 

plots,  the  mean  square  approximation  error  was  conjectured  to  be  of  the 

following  form: 

1  T 

lira  -  /  (pL(t)  -  pL(t))2  dt  =  Klg2(e)  (l  +  K2  eT2(e))  (8-44) 

T—  T  0 
e+0 

where  Kq  and  K2  are  constants.  To  support  the  conjecture,  log-log  plots  of 
approximation  error  versus  the  expression  in  Eqs.  8-44  with  Kq  =  K2  =  1  were 
generated  for  the  simulation  runs  with  e  =  .1,  .01,  .001  (Figs.  8-40  through 
8-42).  The  linearity  of  the  plots  and  their  slope  of  1  support  the  conjec¬ 


tured  formula  in  Fig.  8-42.  In  the  case  of  z  =  0.001,  the  plots  are  somewhat 


8-38 


Figure  8-40.  Plot  of  MSE  vs.  In (g( e) ( l+eT^( e) ) 2  for 


Figure  8-41.  Plot  of  MSE  vs.  In  (g(  e)  ( 1+eT^  (  e) )  )*■/ ^  f 
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nonlinear,  which  we  believe  is  only  the  result  of  the  time  interval  lor  simu¬ 
lation,  3000,  being  short  relative  to  the  average  slow  jump  time  (1000). 

A  conjecture  regarding  the  mean  squared  value  of  "information"  at  the 
output  of  the  filter  as  a  function  of  g( e )  and  e  was  formulated  using  Figs . 
H--.3  to  8-45.  Figures  8-43  and  8-44  plot  mean  square  information  versus  >. 
and  g(  t; )  with  resulting  slopes  of  -1/2  and  1.  These  two  plots  lead  to  the 
conjecture  of  Eq .  8-45. 

1  T  |  Y2  2  g2(e) 

lim  -  /  j  Pl(  t ) - dt  =  K3  -  .  (8-45) 

T+oo  T  Q  j  Y1+Y2  e 

£-►0 

To  provide  support  for  this  conjecture,  mean  square  information  was 
plotted  against  the  expression  in  Eq.  8-45  with  K3  set  to  1.  The  result 
was  a  linear  plot  (Fig.  8-45)  with  slop  1  providing  strong  support  for  the 
conjecture.  If  we  combine  Eqs.  8-44  and  8-45,  we  can  obtain  a  condition  for 
the  relative  error  of  the  approximation  to  go  to  zero.  First  we  divide  the 
equations  to  obtain  Eq.  8-46. 

=  K4  e(l+eT2( e) )  -  (8-46) 

The  expression  on  the  right  side  of  Eq.  8-46  approaches  0  as  e  approaches 
0,  if  T(e)  satisfies  Eq .  8-47. 

T(e)  =  0(ek)  ,  k  >  -1  .  (8-47) 
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In  summary,  this  section  provides  conjectures  for  the  conditions  on  the 
order  ot  r ,  g(e)  and  T(e)  required  to  provide  a  nonzero  quantity  oi  informa¬ 
tion  at  tiie  tilter  output  and  relatively  unimportant  error  for  the  front-end/ 
back-end  approximation  to  the  ideal  filter. 

8.4  MONTE  CARLO  CHARACTERIZATIONS  OF  BATCH  STATISTICS  L( t ) 

In  Section  6,  a  quantity  L(t)  was  introduced  for  the  discrete  measurement 
model  as  the  likelihood  ratio  defined  earlier  by  Eq.  6-64. 

Pr(x(r),  r<t  j  p(t)  on  the  right) 

L(t)  =  -  .  (8-48) 

Pr(x(x),  x<t  j  p(t)  on  the  left) 

This  value  L(t)  is  used  to  perform  Bayesian  updates  in  the  back-end  pro¬ 
cessor  using  Eq.  6-78.  It  would  be  useful  to  obtain  an  analytical  approxima¬ 
tion  to  the  conditional  distributions  of  L(t)  in  order  to  predict  performance 
characteristics  for  the  estimator  structure.  As  a  first  step  in  obtaining 
such  a  formula,  a  series  of  simulations  were  performed  to  obtain  histograms 
for  the  value  of  H(T(e)),  where  l( T(e))  =  Jin  (L(T(e)))  and  T(  e)  is  the  time 
interval  for  each  front-end  batch.  Figures  8-46  and  8-47  show  normalized 
histograms  for  £(T(e))  based  on  200  sample  paths  and  T(e)  =  1  and  3,  while 
Figs.  8-48  through  8-50  show  histograms  for  T(c)  =  10,  31,  and  100  and  using 
2300  runs.  Each  figure  actually  plots  two  histograms,  each  normalized  to  unit 
area,  one  for  o(t)  on  the  left  and  for  for  p(t)  on  the  right.  The  amount  of 
separation  between  the  curves  provides  a  measure  of  the  amount  of  information 
obtained  regarding  which  pair  of  states  the  system  is  in  (left  or  right). 

The  results  for  T(e)  =  1  and  3  are  somewhat  jagged,  but  are  nearly  on  top 
of  each  other,  and  therefore  a  very  small  amount  of  information  is  obtained  in 
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1  or  3  time  units  when  gi(e)  =  0.3.  The  plots  for  T(e)  =  10,  31  and  100  show 
successively  more  separation  between  the  peaks  of  the  curves.  With  less  over¬ 
lap,  there  is  much  less  chance  of  £(t)  or  L(t)  providing  "incorrect”  informa¬ 
tion  about  the  state  to  the  back  end. 

The  appearance  of  the  histograms  suggests  that  a  Gaussian  approximation 
to  the  conditional  distributions  might  be  effective.  To  obtain  such  an  approx¬ 
imation,  we  need  only  calculate  the  mean  and  variance  of  £(t).  We  start  with 
the  number  of  jumps  K  and  the  residence  time  in  the  the  top  state,  assuming 
that  the  process  is  on  the  left.  If  we  assume  that  T(e)  is  long  enough  for 
the  fast  dynamics  to  reach  steady-state,  we  can  calculate: 


E[Rt(t) ] 

x2t 

Xi  +  x2 

VAR[Rt(t)] 

2  X]^X2t 

(X]+X2)3 

E  [  K( t )  ] 

2  XiX2t 

( X]+X2) 

VAR[K(t)] 

4  Ai X2( Xi2+X22) 

(X]+X2)3 

C0V[Rt(t)  K( t ) ] 

2  XiX2(Xi-X2)t 

(Xi+X2)3 

Using  these  equations  we  can  calculate  the  conditional  mean  and  variance  of  £ 
given  that  we  are  on  the  left. 
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E[jt|L]  =  Ci  E[Rt|L]  +  —  C2  E[K|L]  -  C4  (8-49) 

VAR[«.|L]  =  C12  VAR[ Rt | L]  +  C22  VAR[K|l]  +  2  Ci  C2  COV[RtK|L]  (8-50) 

where  ,  C2 ,  and  C4  are  defined  in  the  Appendix  C  (subsection  C.2).  Using 

Eqs.  8-49  and  8-50  and  their  counterparts  if  the  process  is  on  the  right,  the 

approximate  conditional  distributions  for  i,(t)  were  calculated  and  plotted  for 
each  value  of  T(e)  that  was  used  in  the  simulations.  Comparing  these  approx¬ 
imations  (Figs.  8-51  through  8-55)  to  the  simulations  we  see  there  is  close 
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8.5  DISCUSSION 

The  intention  of  the  simulations  was  to  provide  evidence  supporting  both 
conjectures  and  theoretical  results  from  Sections  6  and  7. 

For  the  noise  model  we  note  some  important  features  of  the  simulation 
results : 

1.  For  large  signal  strengths,  the  estimator  behaves  in  a  switch¬ 
like  fashion,  detecting  state  transitions  shortly  after  they 
occur . 

2.  For  low  signal  strengths,  insufficient  information  is  obtained 
between  transitions  to  determine  the  current  state  of  the  system. 

3.  All  the  approximate  filters  approach  the  performance  of  the 
optimal  filter  for  appropriate  parameter  ranges. 

4.  The  nean  square  approximation  error  of  the  aggregate  model  filter 
is  a  factor  0(e)  smaller  than  the  mean  square  "information"  in 
the  output  of  the  optimal  filter  for  small  e  and  g(e)  <  e^2. 

5.  For  g(e)  >  e^2,  the  mean  square  value  of  the  information  rate 
is  relatively  unaffected  by  decreases  in  the  magnitude  of  e. 
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For  the  discrete  measurement  model  we  can  make  the  following  general  comments: 

1.  The  choice  of  the  magnitude  of  gi(e),  the  differences  in  the 
fast  transition  rates,  has  a  critical  effort  on  filter  perfor¬ 
mance  which  can  vary  from  providing  virtually  no  information 
(gi(e)  very  small)  to  switch-like  performance  (gi(e)  large). 

2.  For  small  values  of  g(e)T(e)  a  differential  equation  on  the  slow 
time  scale  can  be  used  to  approximate  the  optimal  filter  with 
good  accuracy. 

3.  The  optimal  filter  can  be  approximated  by  a  two-stage  structure 
with  front  end  calculating  Rt  and  K  and  back  end  carrying  out 
prediction  and  update  steps  given  appropriate  conditions  on  the 
system  and  filter  parameters.  This  condition  is  conjected  to 
be  that  the  expression  in  Eq.  8-46  be  small. 

4.  The  conditional  distribution  of  the  statistic  £.(t),  the  log  of 
the  likelihood  ratio  defined  by  Eq.  6-64  can  be  well-approximated 
by  a  Gaussian  distribution.  Furthermore,  the  mean  and  variance 
of  the  distribution  can  be  approximated  by  simple  functions  of 
the  fast  transition  rates. 
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SECTION  9 
CONCLUSIONS 

In  this  report  we  have  described  the  results  of  our  research  on  alternate 
architectures  for  nonlinear  filtering  problems  possessing  the  key  features 
found  in  ASW  problems.  Specifically,  we  have  focused  on  systems  in  which  a 
slowly-varying  part  of  the  state  influences  or  modulates  the  behavior  of  a 
much  more  quickly  varying  portion  of  the  state.  This  fast  variable  is  then 
observed,  and  the  ultimate  objective  is  to  estimate  the  slow  variables. 

As  we  have  discussed,  this  problem  can  be  cast  in  the  framework  of  per¬ 
turbation  methods.  In  Section  4  we  provide  a  survey  of  research  on  perturbed 
estimation  problems  and  discuss  the  architectural  implications  of  these  vari¬ 
ous  results.  A  conclusion  of  this  survey  is  that  there  are  no  previously- 
developed  results  that  are  applicable  to  problems  of  the  type  of  interest 
in  this  study. 

There  are  several  key  ideas  that  we  wished  to  address  in  this  study.  One 
was  to  develop  an  understanding  of  the  relationship  between  the  time  scale  of 
the  slow  process  and  the  order  of  the  magnitude  of  the  measurement  signal-to 
noise  ratio.  The  second  was  to  investigate  alternate  suboptimal  architectures 
and  in  particular  the  so-called  front-end/back-end  architecture  commonly  used 
in  ASW  systems.  In  such  a  system,  the  front  end  processes  a  batch  of  data  to 
produce  an  estimate  of  the  slow  variables  based  on  the  assumption  that  the 


slow  variables  are  constant  over  the  batch.  This  sequence  of  batch-produced 
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estimates  Is  then  fed  into  a  back-end  processor  that  tracks  the  slow  varia¬ 
tions  in  the  slow  process.  Questions  of  interest  are:  when  is  this  archi¬ 
tecture  a  good  one  and  how  long  should  the  batches  be  when  compared  to  the 
natural  scales  determined  by  the  slow  process  and  by  the  measurement  infor¬ 
mation  rate. 

Our  investigation  of  these  and  several  related  questions  has  been  carried 
out  using  a  relatively  simple,  four-state  Markov  process  possessing  two  time 
scales.  We  have  used  two  different  measurement  models  in  our  study:  one  in 
which  poor  quality,  noise-corrupted  measurements  of  the  fast  process  are 
available  and  one  in  which  we  have  perfect  measurements  of  the  fast  process 
but  the  influence  of  the  slow  process  on  the  fast  one  is  quite  weak.  In  this 
context  we  have  been  able  to  explore  at  great  depth  a  variety  of  suboptimal 
filter  structures  and  have  obtained  both  theoretical  results  and  convincing 
evidence  from  simulations  concerning  the  asymptotic  optimality  of  several 
suboptimal  estimators. 

In  particular,  for  the  problem  with  poor  noise-corrupted  estimates: 

1.  We  have  proven  an  important  result  showing  the  asymptotic  opti¬ 
mality  of  a  suboptimal  slow  estimator  that  essentially  averages 
out  the  fluctuations  in  the  measurements  due  to  the  fast  vari¬ 
ables.  This  result  provides  a  clear  picture  of  the  relationship 
between  time  scales  and  the  asymptotic  quality  of  the  measure¬ 
ments.  These  results,  and  several  stronger  sample  path  proper¬ 
ties,  have  been  demonstrated  in  our  simulations. 

2.  We  have  implemented  front-end/back-end  structures  both  using  the 
full,  four-state  model  and  the  same  averaging  strategy  described 
in  1.  Simulations  have  shown  that  both  of  these  estimators 
achieve  asymptotically  the  same  performance  as  the  optimal  filter. 

In  addition,  these  results  suggest  particular  asymptotic  formulas 
for  the  length  of  the  time  interval  used  in  the  front-end  batch 
processor . 


For  the  case  perfect  fast  measurements  but  weak  slow-to-fast  coupling: 
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1.  We  have  demonstrated  via  simulations  that  an  extremely  slow 
integration  of  the  optimal  estimation  equations  —  which  is 
equivalent  to  a  simple  front-end  counter  and  interruptable  clock 
followed  by  a  slow  back-end  processor  —  performs  asymptotically 
as  well  as  the  optimal. 

2.  We  have  also  studied  the  standard  front-end/back-end  algorithms 
for  this  problem.  Our  extensive  simulations  provide  extremely 
convincing  evidence  supporting  our  conjectures  concerning  asymp¬ 
totic  optimality  and  the  relationships  among  the  orders  of  mag¬ 
nitude  of  time  scales,  slow-to-fast  couplings,  and  front-end 
batch  lengths. 

The  results  presented  in  this  report  not  only  are  of  significance  in 
their  own  right,  but  they  also  provide  clear  direction  for  further  work:  in 
the  theoretical  verification  of  the  numerous  conjectures  developed  through 
our  simulations;  in  developing  higher-order  approximations  that  indicate  the 
value  of  slightly  more  complex  estimators;  and  in  extending  these  ideas  to 


continuous-state  nonlinear  systems. 
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APPENDIX  A 

SOLUTION  OF  THE  WAVE  EQUATION 

Pressure  in  a  three-dimensional  homogeneous  fluid  with  sound  speed  c 
satisfies  the  wave  equation 

—  •  — - V^u  =  f(x,t)  (A— 1) 

c2  a  t2 


where  f(x,t)  is  the  source  field.  We  are  interested  in  the  case  of  a  moving 
point  source  at  xx(t)  for  which  f(x,t)  is  given  by 


f(x,t)  =  4ttS(x  -  xT(t))*yx(t) 


(A-2) 


where  6  is  the  three-dimensional  Dirac  delta  function  and  yf(t)  is  the  source 
signal.  The  solution  of  Eq.  A-l  can  be  expressed  in  terms  of  Greens  function 
as  an  integral 


fU.t  - 


u(x,t)  =  / 


4  TT  •  |x-£  | 


(A-3) 


Substituting  Eq.  A-2  in  Eq.  A-3  gives 


«  U-xT  t 


u(x,t)  =  / 


lx~C  | 


(A-4) 


To  integrate  Eq.  A-4  we  need  to  change  variables  from  £  to  ;;  defined  by 


I 


% 

I 


A 


A, 


v 


A 
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5  =  £  -  xT  I  t  - 


(A-5) 


Note  that  the  Jacobian  of  the  transformation  is 


3t\  xT  (x-C)T 

~  =  - 

3C/  c  |x-H 


(A-6) 


where  the  superscript  T  in  Eq.  A-6  denotes  vector  transposition  and  I3  denotes 


the  3x3  identity  matrix.  The  determinant  of  the  Jacobian  is  needed  to  trans¬ 


form  variables  in  the  integral  and  is  given  by 


/3c\  <xt,x-5> 

det  I  —  1=1-  - 

\n!  c.|x-H 


The  integral  in  Eq.  A-4  becomes 


l'1  /  l*~£| 

<xt,x-£>  ylt - 

[  1 - - - — 


c*|x-£| 


6(C)  dc 


where  £  in  Eq.  A-8  satisfies  Eq.  A-5.  Thus,  we  get 


u( x , t )  =  1  - 


<xTtx-C>  ylt  - 


c- Ix-C| 


where  £  is  chosen  to  satisfy 


0  =  £  ~  Xx  1  t 


(A-7) 


(A-8) 


(A-9) 


(A-10) 


i'.y 
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Letting 


we  see  that 


(A— 11) 


CT  =  |x-XX(t-T)j  , 


(A-12) 


1  -  T  =1  1  - 


<Xt,x-5> 


C.  x-5 


(A-13) 


and  finally 


y(t-t) 

u(x,t)  =  — — —  (1-t)  • 


(A— 14) 
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APPENDIX  B 

FILTERING  EQUATIONS  FOR  PARTIALLY-OBSERVED 
FINITE-STATE  CONTINUOUS-TIME  MARKOV  PROCESS 

B.l  PROBLEM  FORMULATION  AND  FILTER  EQUATIONS 

Suppose  that  x(t)  is  a  finite-state  continuous-time  Markov  process  and 
define  the  observation  process  y(t)  by 

y(t)  =  h(x(t) )  .  (B-l) 

We  will  derive  a  stochastic  differential  equation  for  the  conditional  proba¬ 
bility  trt(C)  defined  by 

TTt(C)  =  Pr  {x(t)=£  |  y(x)  ,0>T>t}  .  (B-2) 

Let  X(cU')  denote  the  transition  rate  from  £'  to  £.  That  is, 

Pr{x(t+A)=E|x(t)=C'}  =  +  A»X(5|5')  +  o(A)  ,  (B-3) 

where  A>0,  655*  is  the  Kronecker  delta  function,  and  o(A)  is  an  error  term 
that  tends  to  0  faster  than  A  as  A  tends  to  0. 

Assume  that  x(t)  is  a  version  of  the  Markov  process  which  has  right 
continuous  sample  paths  and  which  has,  at  most,  a  finite  number  of  discon¬ 
tinuities  in  any  finite  time  interval.  Let  Jt  denote  the  counting  process 
associated  with  y.  That  is,  Jt  is  the  number  of  discontinuities  of  y  in  the 
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We  will  show  that  itt(£)  satisfies  the  stochastic  differential  equation 


t  t 

*t(S)  =  *0(0  +  /  Fs  ds  +  /  Gs  dJs 
0  0 


(B-A) 


where  the  first  integral  is  Lebesgue  and  the  second  is  Stieltjes.  If  l<k, 


are  the  transition  times  of  the  y  process  (and  therefore  also  of  the  counting 


process  J),  then  the  Stieljes  integral  in  this  case  is  simply 


/  Gs  dJs  -  I  Gt^ 

0  Tk<t 


(B-5) 


The  integrands  Ft  and  Gt  are  defined  as  follows. 


Ft  =  6h(£)y(t)*[E  X(cU')irt(5')  -  "tCO’E  X(y(t)  |  O  )*t(  O  )  1  (B-6) 

O  V 


where 


x(y(t)U’)  =  z  x(rU’)  , 

h(c”)=y(t) 


(B-7) 


l  X(5U')  TTfU') 


Gt  =  6h(Oy(t)  * - "t-CO 

I  X(y(t)U')TTt-U’> 

O 


(B-8) 


where 


*t-(0  =  lim  tts(0 
s  +  t 


(B-9) 


A  special  case  of  this  general  result  is  when  x(t)  has  two  components 


x(t)  =  (x1(t),X2(t)) 


(B-10) 


N  *.  *.  *. 


a a  a . a  a 


m.  A  ^  K ^  ' 
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y(t)  =  h(x(t) )  =  x2(t) 


In  this  case  the  stochastic  differential  equation  for 


( B— 11) 


TTt  c  Cl)  =  Pr{xi(t)=c|y(t),0<T<t} 


(B-12) 


is  the  same  as  (B-4)  with  F  and  G  given  by 


Ft  *  ^/(e1,y(t)U[,y(t))irt(q)  -  i^cq)  •  ^ x(y(t)|q,y(t))irt(q)  (B-13) 


£  x(c1,y(t)Uj,y(t-))irt_(5p 
Si 

Gt - »t-Ul) 

l  x(y(t)|c’  y(t-))it  ( Ci ) 

ct  1  L  L 


(B-14) 


B.2  OUTLINE  OF  PROOF  OF  FILTERING  EQUATIONS 

A  direct  proof  of  these  results  depends  on  showing  that  (1)  ut(5)  has 
a  derivative  given  by  F^  at  each  time  t  at  which  does  not  jump  and  that 
(2)  at  jump  times  t,  xt(£)  satisfies 


nt(C)  =  Gt  +  iTt-U) 


(B-15) 


For  jump  times  t,  Eq.  B-15  follows  directly  from  Bayes  rule  for  a  finite-state 
Markov  chain.  We  will  sketch  how  to  prove  (1)  in  the  rest  of  this  section. 
Assume  that  t>s  and  define  As>t  as 


As>t  =  1  if  J t  Js  =  0 
=  0  if  J J c  ^  1 


(B-16) 


I’h 


m 
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;; 
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Define  4>s  as  a  function  of  sample  paths  of  y  as 


^(yKO  =  y(r)  for  T<t 


=  y(t)  for  T>t  . 


(B-17) 


Suppose  that  f  is  Yt  measurable  (Yt  =  o{y(x)  ,0<T<;t} ) .  Then  f.<|>s  is  Ys  mea¬ 


surable  and 


f *As,t  -  (  f  •  <t> s )  *As,  t 


(B-18) 


Therefore,  letting  f  =  nt(£),  we  get 


1Tt(0*As>t  -  (lTt(0*4ls)*AS,t 


(B-19) 


which  implies  that  irt(£).Agjt  is  Ys  V  o{As>t)  measurable.  Thus, 


"tU)‘As,t  =  E{TTt(0|Ys,As>t}.As>t 

—  Pr  {x(  t  )=£  |  Ys , As  ^  j- }  «AS ^  t  . 


(B-20) 


Note  that 


Pr{x(t)  £  ( Ys  ,/ig  ^  ^  =  1 }  “ 


Pr{x(t)=C,As>t=l|Ys} 


Pr{As,t=l I Ys } 


(B-21) 


where 


Pr{x(t)=5,As>t=l|Ys}  =  Z  Pr{x(t)  =  C,As  c=1|x(s)  =  5’}-tts(S') 

r 


(B-22) 


Pr  {As»  **  |  y  s }  =  1  Pr{As  t=l|x(s)  =  c’}^s(5’)  • 

C' 


(B-23) 


Let  Bs>t  denote  the  event  that  there  is  more  than  one  jump  of  x  in  the  inter¬ 


val  (s,t],  and  let  B^  t  be  the  complementary  event  that  there  is,  at  most,  one 


one  jump  in  (s,tj.  Then  we  have  the  relationships 
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6  .Pr{x(t)=£,BC  |x(s)=C'} 

h(C)h(5  )  s*t 

<  Pr{x(t)=C,As>t=l |x(s)  =  C'  } 

and 

<5h(?)h(C,)*Prtx(t;)  =  C  |x(s)=C'  } 

>  Pr{x(t)=5,ASjt=l|x(s)=C*} 

It  follows  from  Eq.  B-24  and  B-25  that 

l6h(C)h(5')*Pr{x(t)=C|x(s)=C,>  ~  Pr{x(t)=C,As>t=l jx(s)=5' }| 

<  Pr (Bs> 1 1 x(s)  =  C' }  • 


(B-24) 


(B-25) 


(B-26) 


Note  that 

Pr(®s, t I X(S)=C ' )  =  o(t-s)  . 

Putting  Eqs .  B-27  and  B-26  together  with  Eq.  B-3  gives 

Pr{x(t)=C,ASft=l|x(s)=C'}  =  'ShCOhU')*^^'  +  (t_s)  **U|  V  )  1 

+  o(t-s) 


(B-27) 


(B-28) 


Using  Eq.  B-28,  we  can  derive  the  approximations 


pr{x(t)-£  ,AS- t=l.  |  Ys)  -  6n( 5)y(s)  *  t xs( 5) 

(B-29) 

+  l  A(c|C*)-tts(C’)  •  ( t-s ) ]  +  o(t-s) 

h(5')=y(s) 

and 

pr{As  t=l  |  Y  s }  =  1  +  l  l  A(C"U')»its(C,)*(t-s) 

h(5")=h(C’)=y(s)  (B-30) 

+  o( t-s ) 


From  Eqs.  B-21,  B-29,  and  B-30,  we  get 


B-5 
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Pr{x(t)=s|Ys,ASft=l} 

=  6h(£)y(s)  *  [*s(0  +  2  X(e|c,)*ws(C,)«(t-s) 

h(C')=y(s) 

E  Z  X(g"  U'WsU’J^sUMt-s)]  +  o(t-s) 
h(C")=h(C')=y(s) 

Note  that  irs(£)  =  0  unless  h(£)  =  y(s).  Thus, 

6h(5)y(s)*ns(5)  =  *s(5) 
and  Eq.  B-21  can  be  expressed 
Pr{x(t)=c|Ys,ASjt=l} 

=  irs(5)  +  «h(C)y(s)  ’  [  *  X(cU’)*Trs(5')*(t-s) 

h(C’)=y(s) 

-EE  X(C' |5')*its(5,)*ns(5)*(t-s)]  +  o(t-s) 
h(5")=!h(5,)=y(s) 

From  Eq.  B-20, 

MO  ~  =  [Pr{x(t)=5 |YS,AS> ^=1}  ~  'Rs(0]*As,t 

+  [MO  “  MO]*U  ~  As,t) 

and  it  follows  from  Eq.  B-33  and  the  definition  of  Fs  that 

MO  “  MO  =  Fs’(-t~s )  +  o(t-s) 

+  [MO  -  MOl’C1  ~  As,t)  • 


Finally  note  that  because  y  is  right  continuous, 
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I  ' 
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is  true  for  all  s.  If  t  is  not  a  discontinuity  of  y,  then  it  is  also  true 
that 


1-A 


lim 


s,t 


=  0 


stt  t-s 


(B-37 ) 


Note  that  u^,  nS)  and  Fs  are  uniformly  bounded.  Using  this  fact  together  with 
Eqs.  B-36  and  B-37,  it  follows  that  Eq.  B-34  implies  irt  is  continuous  at  t  if 
t  is  not  a  discontinuity  of  y.  Consequently,  Ft  is  also  continuous  at  such 
times  t,  so  that 

(B-38) 


lim  F s  =  Ft 
s  +  t 


in  Eq.  B-34.  It  follows  that  irt  is  continuously  differentiable  at  t  with 
derivative 


irt<5)  _  *s(£) 

lim  -  =  Ft 

s  +  t  t-s 


(B-39) 


B  .3  EXAMPLE 

As  example,  consider  the  four-state  Markov  process  with  state  space 
{0,1}X{0,1}  and  transition  rates  given  by  (see  Fig.  B-l) : 


X(0,l|0,0)  =  a,  X(0,0|0,1)  =  a’ 
X<1,0|0,0)  =  S,  X(0,0|l,0)  =  B’ 
X(1,1|1,0)  =  a,  A(1 ,0 | 1,1)  =  a' 
X( 1 > 1 1 0 , 1 )  =  y ,  X(0,1 1 1 , 1)  =  y' 


(B-40) 


and 


X(0 ,0 | 0 ,0)  =  -(a+B) 
X(1,0|1,0)  =  -(a+B1) 
X(  1 ,1  1 1 ,1)  =  — (a'+y* ) 
X(0 , 1 | 0 , 1 )  =  -(a’+Y) 


(B-41) 


B-7 


>  >V 


*-  .V 
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The  X(X|q,y')  are  given  by 


ft 


V . 

S  m 


X(l|0,0)  =  X(ljl,0)  =  a 

X(0j0,0)  =  X(0|1,0)  =  -a 

X(0|0,1)  =  X(0 | 1,1)  =  a’ 

X( 1 | 0 , 1 )  =  X(l|l,l)  =  -a' 


(B-42) 


:-5 


In  this  example,  satisfies  a  linear  differential  equation  with  coeffi¬ 

cients  that  switch  randomly  with  y (t).  That  is, 

7rt(0)  =  8’  -  (3  +  8')irt(0)  when  y(t)  =  0  (B-43) 

and 

tt t(0)  =  y*  ~  (Y  +  Y*  )wt(0)  when  y(t)  =  1  .  (B-44) 

Let  0 o  and  0^  denote  the  equilibrium  points  of  the  individual  differential 
equations  (Eqs.  B-43  and  B-44),  namely 


i 


B' 

00  B  +  B' 


(B-45) 


lr 


•7 

*■  . 


3 

v, 


V, 

V, 

V 


9l  =  •  (B-46) 

During  an  interval  of  time  (t^.t^+i]  when  y(t)  =  i,  the  probability  iii(i) 
approaches  0^  exponentially.  If  0g  *  0]_,  then  n t ( 0)  eventually  becomes 
trapped  between  0q  and  0]^  and  oscillates  randomly  as  shown  in  Fig.  B-2.  If 
00  =  ®1»  then  tt t ( 0 )  approaches  0^  in  the  limit.  If  0,  0',  y,  y'  are  very 
large  compared  to  a,  a',  then  the  oscillations  approach  the  limit 

0O* [ l-y(t)]  +  6i-y(t)  .  (B-47) 


B-8 


X 
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B.4  REGULAR  MEASUREMENT  MODEL 


Suppose  that  x^(t)  is  itself  a  Markov  process  and  X2(t)  is  a  process  that 


depends  on  x^Ct)  in  such  a  way  that  x(t)  =  (x^( t ) , X2( t ) )  is  a  Markov  process. 


Furthermore,  suppose  that  for  almost  all  sample  paths  of  x(t),  x^(t)  and  X2(t) 


do  not  jump  simultaneously.  Then  the  transition  rate  X  for  the  joint  process 


x  has  the  following  form 


6C2C2*X1(CI)  +  6ciCi*X2(5i’C2) 


(B-48) 


where  X^  is  the  transition  rate  of  x^ ,  and  X2  is  a  transition  rate  for  a 


process  X2  if  £[  is  held  constant,  that  is 


X2(C2U[,C2>  >  0  if  C,  *  C? 


2 


(B-49) 


r  a(c2 *  0 


(B-50) 


jr  each  £j,  To  see  that  Eq.  B-48  is  true  note  that  if  x^(t)  and  x^Ct) 


do  not  have  simultaneous  jumps,  then  X  can  be  written  in  the  form 


<5^ 


(B-51) 


Note  that 


0  =  z  x(c1,c2U[,C2) 


Cl. 52 


(B-52) 


1  +  J:  a2(C2U[,C2) 
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Define  ,  a2  as 


€1 


a2^2^1’^2^  a2^2^1’^2^  ^2  ’^2*^  a2^2^1’^2^ 


5l 


Then  X  can  be  written  as 


where 


xccj.CjIcj.ej) 

-  s«2«2**iuii?i-t2)  +  vrv^hi-q)  • 

e  liCqlcj.ep  =  0  • 


(B-53) 

(B-54) 


(B-55) 


(B-56) 


Summing  Eq.  B-55  over  £2  gi-ves 

XC^Iq)  -  I  X(C1,C2U[,Cp  «  .  (B-57) 

52 

Substituting  Eq.  B-57  in  Eq.  B-55  gives  the  desired  expression  (Eq.  B-48). 
Equation  B-50  is  just  Eq.  B-56,  and  Eq.  B-49  follows  from  Eq.  B-48.  Note 
that  Eq.  B-48  can  also  be  expressed  by  saying  that  the  infinitesimal  generator 
A  of  x  given  by 


ufxq,/^)  =  e  xo^.^iq.Ep-fcq.ep 

^1  ’^2 


(B-58) 


has  the  form 


(Af)(^i ,C2>  ~  Aif(* ,52)(5l)  +  A2(5l)f(^l» *)(S2) 


(B-59) 
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where  A^  is  the  infinitesimal  generator  corresponding  to  \\  and  A2(£l)  is  the 
infinitesimal  generator  corresponding  to  Cl  *  *  I  Cl > *)  • 

We  will  call  Eq.  B-48  the  regular  measurement  model.  It  is  the  discrete 
state  version  of  the  following  measurement  model  typically  used  for  diffusion 
processes : 


dx^  =  f(x^)dt  +  dw 


(B-60 


dx2  =  h(xi,x2)dt  +  dv 


(B-61 


Note  that  the  regular  measurement  model  is  only  a  special  case  of  the 
partially-observed  Markov  process  formulated  in  Eq.  B-l.  In  some  cases,  it 
will  be  necessary  to  consider  the  more  general  model  (e.g.,  the  limit  of  a 
parameterized  family  of  regular  models  need  not  be  regular). 

The  finite-state  filtering  equation  for  the  special  measurement  model 
(Eq.  B-48)  is  given  by 


Ft  *  £|Xl^il€{)*t(5i)  +  X2^y(t)lCl’y(t)^t(Cl) 

C1 


(B-62 


-  irt(Cl)  •  Z  A2 (y ( t )  I  c;  , y(  t )  ) IT  (  ) 

q 


X2(yCt> | ci ,y<c-) )wt_(c1) 
x2(y(t)U[,y(t-))nt_(5[) 


-  *t-(Ci) 


(B-63 


We  are  particularly  interested  in  singular-perturbation  problems  where 


the  observation  process  y  =  x2  is  speeded  up  by  a  factor  of 


This  gives 


X(C1,C2  |c[,^2> 


=  6c9c;-xi<£iU[>  + 


(B-64 


•  St,ti‘i2U2Ui,t2>  ' 


wl 
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The  singularly  perturbed  filtering  equations  are 

Ft  =  Z  X1(C1)U[)ut(£;{)  +-j~  *  X2(y(t)|cl,y(t))nt(e1) 

Cl  (B-61)) 

-  ~~  •  ^(C^  •  1  X2  (y(t)  j  C{,y(t)  )7rt(  C[)  • 

C1 

X2(y(t)Ui,y(c-))nt-(^i) 

Gt  =  - —  -  nt-(^l)  *  (B~6b) 

x2(y(t)|c[,y(t-))irt_(cp 

5i 

B .5  UNNORMALIZED  FILTERING  EQUATIONS 

In  nonlinear  filtering  for  diffusion  processes,  Zakai's  equation  for 
an  unnormalized  conditional  density  is  linear  in  the  density  and  easier  to 
analyze  than  the  nonlinear  equation  for  the  conditional  density  itself. 

There  is  an  analogous  equation  for  an  unnormalized  conditional  probability 
distribution  in  the  finite-state  continuous-time  filtering  problem.  Consider 
the  general  problem  formulated  in  Eq.  B-l .  Let  qt(C)  be  the  solution  of  the 
following  stochastic  differential  equation. 

=  [fih(c)y(t)  •  z  tfelOqtCOJ'dt 

V  (B-67) 

+  ^6h(c)y(t)  *  2  *UU')qt-U')  -qt-(5)]*dJt  • 

V 

Define  qt  to  be  the  normalization  factor 

qt  =  I  qtU)  •  (B-68) 

C 


Then  the  conditional  density  Trt(C)  is 
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qt(C) 

*t(0  =  — -  .  (B-69) 

qt 

It  Is  not  hard  to  show  that  Eq.  B-69  is  true  for  all  t  provided  that  it  is 
true  initially  at  t=0.  At  a  jump  time  t  = 

qtU)  =  6h(c)y(t)  •  £  U')qt-(0  (b-™) 

V 

so  that  if  Eq.  B-69  is  true  for  t<Tk,  it  is  clearly  true  at  Tk  (that  is,  Eq. 
B-70  preserves  the  relation  (Eq.  B-69)  through  the  jump).  In  between  jump 
times,  tk  <  t  <  tk+l.  9t(0  and  it  satisfy  the  ordinary  differential  equations 

qtU)  -  «h(()y(t)  •  £  X(cU') qt(C')  <B-71) 

V 

and 

qt  =  1  X(y(t)|c,)qt(C')  •  (B-72) 

V 

It  follows  that  the  quotient  qt(C)/qt  satisfies  the  same  equation  as  ut(£)  in 
between  jump  times,  namely 

*t(5)  =  «h(Oy(t)  *  [£  *(cU’)"tU'> 

V  (B-73) 

-  Trt(C)  •  l  x(y( t )  |  e'  )nt( C* )  ]  • 

V 

Thus,  if  Eq.  B-69  is  true  at  t  =  Tk,  it  will  remain  true  throughout  the 
interval  Tk  <  t  <  Tk+l* 

For  the  measurement  model  given  by  Eq.  B-ll,  the  unnormalized  distribu¬ 
tion  satisfies  the  equation 
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dqt(Cl)  =  [E  A(c1,y(t)|^J,y(t))qt(Cp ]*dt 

(  B-74 

+  [e  »y(t) U| ,y(t-) )qt_Uj)  -  qt_(C1)]-dJt  . 

^1 

For  the  regular  measurement  model,  the  unnormalized  distribution  satisfies 
the  equation 

dq  t  ( £  i )  =  [E|X1(C1|epqt(Cj)  +  X2(y(t)|c1,y(t))qt(C1)]-dt 

C1  ( B-75 

+  [ X2 [y< t ) U1,y(t-))qt_(£i)  "  q t-(Cl )]*dJt  • 

For  the  singularly-perturbed  problem  we  have  two  choices:  either  to  use  the 
unnormalized  equation  (Eq.  B-75)  obtained  by  replacing  X2  with  ^  -  X2  or  to 
use  a  somewhat  different  version  derived  directly  from  Eqs.  B-65  and  B-66 
in  the  same  way  as  Eq.  B-67  was  derived.  Note  that  there  are  many  possible 
unnormalized  distributions,  and  no  one  choice  need  be  uniquely  the  best.  We 
present  the  different  version  because  it  appears  to  give  better  scaling  in 
the  jump  term. 

(E  X1U1UpqtUp  +  ~~  •  X2(y(t)|ci,y(t))qt(e1)]*dt 
C1  (B-76 

+  [X2(y(t)Ui,y(t-))qt-(?i)  “  qt-Ui>  ] *dJt  • 

Note  that  some  care  needs  to  be  taken  in  analyzing  the  asymptotic  behavior  of 


it^  by  analyzing  an  unnormalized  distribution  such  as  qt  in  Eq.  B-76.  If  q^ 
has  leading  order  magnitude  41(e),  then  we  need  to  analyze  qt  to  higher  order, 
namely  o(i|/(e)),  in  order  to  say  something  useful  about  the  asymptotics  of  up 


ALPHATECH,  INC 


APPENDIX  C 

DECISION  PROBABILITIES  FOR  THE  HYPOTHESIS  TEST  OF  EQS.  6-57  AND  6-58 

We  are  concerned  here  with  analyzing  the  performance  of  the  decision  rule 
specified  by  Eqs.  6-57  and  6-58.  For  simplicity  we  drop  the  subscript  k  and 
we  assume  that  x(0)  =  1  (analogous  expressions  can  be  developed  if  x(0)  =  0). 
Taking  logarithms  of  Eqs.  6-57  and  6-58  and  simplifying,  we  see  that  an  equiv¬ 
alent  form  for  this  decision  rule  is  for  K  even 


C!  R  + 


1 


mk  =  R 

> 


-  C2  K 

2  < 


C4 


(C-la) 


mk  =  L 


while  for  K  odd  it  is 


1 


m'K  =  R 

> 


Ci  R+  -  C2(K-1)  +  C3  C4 

2  < 


(C-lb) 


mk  =  L 


where 


Ci  =  (8-a)  g(e)  ,  C2  =  log 


ctg(  f  Bg(e) 
1  +  -  1  +  — 


*1 


*2 


C3  =  log 


1  + 


ag(  e) 


.  C4  =  Bg(e)  T(e) 


(C-2) 


C-l 
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The  key  to  the  performance  of  Eq.  C-l  is  then  the  joint  distribution  for 
R  and  K.  under  each  hypothesis  (i.e.,  p(t)  on  the  left  or  p(t)  on  the  right). 
For  the  case  of  p(t)  on  the  left 


P(R,  K=0)  =  e"XlT(e)5(R-T(e)) 

P(R,  K=l)  =  e(X2-Xl)R  e~X2  T(e) 
p(R,  K=2)  =  ( XXR) X2  e(X2-Xl)R  e_X2  T(e) 


p(R,  K=2M-1) 


M  (M-l ) 

(XiR)  [ X2(T( e)-R) ] ; 

-  - 

M!  (M-l)!  , 


e(X2-Xi)R  e-X2  T(e) 


p(R»  K=2M) 


M  (M-l) 

(MR)  U2<T(e)-R)r  ’ 

-  -  ^ 

M!  (M-l)! 


e(X2-Xi)R  e-X2  T(e) 


(C-3) 


where  the  last  two  expressions  for  for  M=  2,3,...  . 

To  illustrate  the  nature  of  the  computations,  suppose  that  ,  C2  >  0. 

Then 


°°  /  C4~C2M  \  »  /  C4~C3-C2M 

4>LL(e)  =  l  Pr  R  <  -  ,  K  =  2M)  +  £  Pr  R  <  - 

M=0  V  C1  )  M=0  C1 


N^e)  T(e)  N2U)  Yl(e) 

=  l  \  p(R,2M)dR  +  l  /  p(R,2M)dR 

M=0  o  M=Ni(e)+l  o 


K  =  2MF1 


) 


N3(e)  T(e)  N4(e)  Y2(e) 

+  l  j  p(R,2Mfl)dR  +  I  /  p(R,2M+l)dR 

M=0  o  M=N3(e)+l  o 

(C-4) 


C-2 


'  N 


k 


p 
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K  . 
h\ 


i 


whert 


Ni(e)  = 


C4-C1  T(e) 
C2 


C4 

,  N2(e)  =  — 

C2 


N3(e)  = 


C4“C3-C1  t<£) 

c2 


,  N4(e)  = 


C4-C3 

C2 


(C-5) 


*  4' 


C4-C2M  C4-C3-C2M 

Yl(e)  =  -  ,  Y2(£)  = 


Cl 


Cl 


